Searches App

Searches app provides a useful, efficient and scalable search backend using basic techniques of information retrieval.

Each course model contains a vector stored as a pickled scipy sparse vector. This vector represents this course. Upon search, a vectorizer creates a similar vector representation for that query. We then fetch ~100 candidates for serving as search results. These canidates are then sorted by the cosine similarity between their vector and the query vector.

To increase accuracy and provide for a clean search experience for key use cases, the search places a heavier weight on courses with matching titles. However, description and other fields are searched as well.

Views

class searches.views.CourseSearchList(**kwargs)[source]

Course Search List.

get(request, query, sem_name, year)[source]

Return vectorized search results.

post(request, query, sem_name, year)[source]

Return advanced search results.

Utils

class searches.utils.Searcher[source]

Searcher class implements baseline search and vectorized search based on information retrieval techniques.

get_acronym(name)[source]

Returns an acronym of a course name.

get_cosine_sim(sparse_vec1, sparse_vec2)[source]

Computes cosine similarity between two sparse vectors.

get_most_relevant_filtered_courses(query, course_filtered)[source]

Returns the most relevant filtered courses given a query from filtered course objects.

get_score(course, query, query_vector)[source]

Computes similarity score based on cosine similarity and match between query and course name.

get_similarity(query, course)[source]

Vectorizes query and returns a cosine similarity score between query and course vector.

load_count_vectorizer()[source]

Loads english dictionary count vectorizer pickle object.

matches_name(query, course_name)[source]

Returns a score (2, 1, 0) of a query match to course name.

print_similiarity_scores(courses, query)[source]

Prints all course similarity scores given a query (for debugging).

vectorize_query(query)[source]

Vectorizes a user’s query using count vectorizer.

Returns filtered courses that are most relevant to a given query.

wordify(course_vector)[source]

Converts a course vector back into string using count vectorizer.

class searches.utils.Vectorizer[source]

Vectorizer class creates a dictionary over courses and build course vectors using count vectorizer.

course_to_str(name, description, area, weight)[source]

Returns a string representation of a course using a Porter Stemmer.

doc_to_lower_stem_str(doc)[source]

Converts words in document(string) to lowercase, stemmed words.

vectorize()[source]

Vectorize function transforms and saves entire course objects into course vectors using TF-IDF.

Baseline search returns courses that are contained in the name from a query (legacy code).

searches.utils.course_desc_contains_token(token)[source]

Returns a query set of courses where tokens are contained in descriptions.

searches.utils.course_name_contains_token(token)[source]

Returns a query set of courses where tokens are contained in code or name.