Searches app provides a useful, efficient and scalable search backend using basic techniques of information retrieval.
Each course model contains a vector stored as a pickled scipy sparse vector. This vector represents this course. Upon search, a vectorizer creates a similar vector representation for that query. We then fetch ~100 candidates for serving as search results. These canidates are then sorted by the cosine similarity between their vector and the query vector.
To increase accuracy and provide for a clean search experience for key use cases, the search places a heavier weight on courses with matching titles. However, description and other fields are searched as well.
- class searches.utils.Searcher¶
Searcher class implements baseline search and vectorized search based on information retrieval techniques.
Returns an acronym of a course name.
- get_cosine_sim(sparse_vec1, sparse_vec2)¶
Computes cosine similarity between two sparse vectors.
- get_most_relevant_filtered_courses(query, course_filtered)¶
Returns the most relevant filtered courses given a query from filtered course objects.
- get_score(course, query, query_vector)¶
Computes similarity score based on cosine similarity and match between query and course name.
- get_similarity(query, course)¶
Vectorizes query and returns a cosine similarity score between query and course vector.
Loads english dictionary count vectorizer pickle object.
- matches_name(query, course_name)¶
Returns a score (2, 1, 0) of a query match to course name.
- print_similiarity_scores(courses, query)¶
Prints all course similarity scores given a query (for debugging).
Vectorizes a user’s query using count vectorizer.
- vectorized_search(school, query, semester)¶
Returns filtered courses that are most relevant to a given query.
Converts a course vector back into string using count vectorizer.
- class searches.utils.Vectorizer¶
Vectorizer class creates a dictionary over courses and build course vectors using count vectorizer.
- course_to_str(name, description, area, weight)¶
Returns a string representation of a course using a Porter Stemmer.
Converts words in document(string) to lowercase, stemmed words.
Vectorize function transforms and saves entire course objects into course vectors using TF-IDF.
- searches.utils.baseline_search(school, query, semester)¶
Baseline search returns courses that are contained in the name from a query (legacy code).
Returns a query set of courses where tokens are contained in descriptions.
Returns a query set of courses where tokens are contained in code or name.