Data Pipeline Documentation¶
Semester.ly’s data pipeline provides the infrastructure by which the database is filled with course information. Whether a given University offers an API or an online course catalogue, this pipeline lends developers an easy framework to work within to pull that information and save it in our Django Model format.
General System Workflow¶
Pull HTML/JSON markup from a catalogue/API
Map the fields of the mark up to the fields of our ingestor (by simply filling a python dictionary).
The ingestor preprocesses the data, validates it, and writes it to JSON.
Load the JSON into the database.
Note
This process happens automatically via Django/Celery Beat Periodict Tasks. You can learn more about these schedule tasks below (Scheduled Tasks).
Steps 1 and 2 are what we call parsing – an operation that is non-generalizable across all Universities. Often a new parser must be written. For more information on this, read Add a School.
Parsing Library Documentation¶
Base Parser¶
Requester¶
Ingestor¶
- exception parsing.library.ingestor.IngestionError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Ingestor error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.ingestor.IngestionWarning(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineWarning
Ingestor warning class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class parsing.library.ingestor.Ingestor(config, output, break_on_error=True, break_on_warning=False, display_progress_bar=True, skip_duplicates=True, validate=True, tracker=<parsing.library.tracker.NullTracker object>)[source]¶
Bases:
dict
Ingest parsing data into formatted json.
Mimics functionality of dict.
- tracker¶
Tracker object.
- Type
library.tracker
- UNICODE_WHITESPACE¶
regex that matches Unicode whitespace.
- Type
TYPE
- validator¶
Validator instance.
- Type
library.validator
- ALL_KEYS = {'areas', 'campus', 'capacity', 'code', 'coreqs', 'corequisites', 'cores', 'cost', 'course', 'course_code', 'course_name', 'course_section_id', 'credits', 'date', 'date_end', 'date_start', 'dates', 'day', 'days', 'department', 'department_code', 'department_name', 'dept_code', 'dept_name', 'descr', 'description', 'end_time', 'enrollment', 'enrolment', 'exclusions', 'fee', 'fees', 'final_exam', 'geneds', 'homepage', 'instr', 'instr_name', 'instr_names', 'instrs', 'instructor', 'instructor_name', 'instructors', 'kind', 'level', 'loc', 'location', 'meeting_section', 'meetings', 'name', 'num_credits', 'offerings', 'pos', 'prereqs', 'prerequisites', 'remaining_seats', 'same_as', 'school', 'school_subdivision_code', 'school_subdivision_name', 'score', 'section', 'section_code', 'section_name', 'section_type', 'sections', 'semester', 'size', 'start_time', 'sub_school', 'summary', 'term', 'time', 'time_end', 'time_start', 'type', 'waitlist', 'waitlist_size', 'website', 'where', 'writing_intensive', 'year'}¶
- clear() → None. Remove all items from D.¶
- copy() → a shallow copy of D¶
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() → a set-like object providing a view on D’s items¶
- keys() → a set-like object providing a view on D’s keys¶
- pop(k[, d]) → v, remove specified key and return the corresponding value.¶
If key is not found, d is returned if given, otherwise KeyError is raised
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) → None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() → an object providing a view on D’s values¶
Validator¶
- exception parsing.library.validator.MultipleDefinitionsWarning(data, *args)[source]¶
Bases:
parsing.library.validator.ValidationWarning
Duplicated key in data definition.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.validator.ValidationError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Validator error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.validator.ValidationWarning(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineWarning
Validator warning class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class parsing.library.validator.Validator(config, tracker=None, relative=True)[source]¶
Bases:
object
Validation engine in parsing data pipeline.
- config¶
Loaded config.json.
- Type
DotDict
- tracker¶
- KINDS = {'config', 'course', 'datalist', 'directory', 'eval', 'final_exam', 'instructor', 'meeting', 'section'}¶
- static file_to_json(path, allow_duplicates=False)[source]¶
Load file pointed to by path into json object dictionary.
- classmethod load_schemas(schema_path=None)[source]¶
Load JSON validation schemas.
- NOTE: Will load schemas as static variable (i.e. once per definition),
unless schema_path is specifically defined.
- static schema_validate(data, schema, resolver=None)[source]¶
Validate data object with JSON schema alone.
- validate_course(course)[source]¶
Validate course.
- Parameters
course (DotDict) – Course object to validate.
- Raises
MultipleDefinitionsWarning – Course has already been validated in same session.
ValidationError – Invalid course.
- validate_directory(directory)[source]¶
Validate directory.
- Parameters
directory (str, dict) – Directory to validate. May be either path or object.
- Raises
ValidationError – encapsulated IOError
- validate_eval(course_eval)[source]¶
Validate evaluation object.
- Parameters
course_eval (DotDict) – Evaluation to validate.
- Raises
ValidationError – Invalid evaulation.
- validate_final_exam(final_exam)[source]¶
Validate final exam.
NOTE: currently unused.
- Parameters
final_exam (DotDict) – Final Exam object to validate.
- Raises
ValidationError – Invalid final exam.
- validate_instructor(instructor)[source]¶
Validate instructor object.
- Parameters
instructor (DotDict) – Instructor object to validate.
- Raises
ValidationError – Invalid instructor.
- validate_location(location)[source]¶
Validate location.
- Parameters
location (DotDict) – Location object to validate.
- Raises
ValidationWarning – Invalid location.
- validate_meeting(meeting)[source]¶
Validate meeting object.
- Parameters
meeting (DotDict) – Meeting object to validate.
- Raises
ValidationError – Invalid meeting.
ValidationWarning – Description
- validate_section(section)[source]¶
Validate section object.
- Parameters
section (DotDict) – Section object to validate.
- Raises
MultipleDefinitionsWarning – Invalid section.
ValidationError – Description
- validate_self_contained(data_path, break_on_error=True, break_on_warning=False, output_error=None, display_progress_bar=True, master_log_path=None)[source]¶
Validate JSON file as without ingestor.
- Parameters
data_path (str) – Path to data file.
break_on_error (bool, optional) – Description
break_on_warning (bool, optional) – Description
output_error (None, optional) – Error output file path.
display_progress_bar (bool, optional) – Description
master_log_path (None, optional) – Description
break_on_error –
break_on_warning –
display_progress_bar –
- Raises
ValidationError – Description
- validate_time_range(start, end)[source]¶
Validate start time and end time.
There exists an unhandled case if the end time is midnight.
- Parameters
- Raises
ValidationError – Time range is invalid.
- static validate_website(url)[source]¶
Validate url by sending HEAD request and analyzing response.
- Parameters
url (str) – URL to validate.
- Raises
ValidationError – URL is invalid.
Logger¶
- class parsing.library.logger.JSONColoredFormatter(fmt=None, datefmt=None, style='%', validate=True)[source]¶
Bases:
logging.Formatter
- converter()¶
- localtime([seconds]) -> (tm_year,tm_mon,tm_mday,tm_hour,tm_min,
tm_sec,tm_wday,tm_yday,tm_isdst)
Convert seconds since the Epoch to a time tuple expressing local time. When ‘seconds’ is not passed in, convert the current time instead.
- default_msec_format = '%s,%03d'¶
- default_time_format = '%Y-%m-%d %H:%M:%S'¶
- format(record)[source]¶
Format the specified record as text.
The record’s attribute dictionary is used as the operand to a string formatting operation which yields the returned string. Before formatting the dictionary, a couple of preparatory steps are carried out. The message attribute of the record is computed using LogRecord.getMessage(). If the formatting string uses the time (as determined by a call to usesTime(), formatTime() is called to format the event time. If there is exception information, it is formatted using formatException() and appended to the message.
- formatException(ei)¶
Format and return the specified exception information as a string.
This default implementation just uses traceback.print_exception()
- formatMessage(record)¶
- formatStack(stack_info)¶
This method is provided as an extension point for specialized formatting of stack information.
The input data is a string as returned from a call to
traceback.print_stack()
, but with the last trailing newline removed.The base implementation just returns the value passed in.
- formatTime(record, datefmt=None)¶
Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which wants to make use of a formatted time. This method can be overridden in formatters to provide for any specific requirement, but the basic behaviour is as follows: if datefmt (a string) is specified, it is used with time.strftime() to format the creation time of the record. Otherwise, an ISO8601-like (or RFC 3339-like) format is used. The resulting string is returned. This function uses a user-configurable function to convert the creation time to a tuple. By default, time.localtime() is used; to change this for a particular formatter instance, set the ‘converter’ attribute to a function with the same signature as time.localtime() or time.gmtime(). To change it for all formatters, for example if you want all logging times to be shown in GMT, set the ‘converter’ attribute in the Formatter class.
- usesTime()¶
Check if the format uses the creation time of the record.
- class parsing.library.logger.JSONFormatter(fmt=None, datefmt=None, style='%', validate=True)[source]¶
Bases:
logging.Formatter
Simple JSON extension of Python logging.Formatter.
- converter()¶
- localtime([seconds]) -> (tm_year,tm_mon,tm_mday,tm_hour,tm_min,
tm_sec,tm_wday,tm_yday,tm_isdst)
Convert seconds since the Epoch to a time tuple expressing local time. When ‘seconds’ is not passed in, convert the current time instead.
- default_msec_format = '%s,%03d'¶
- default_time_format = '%Y-%m-%d %H:%M:%S'¶
- format(record)[source]¶
Format record message.
- Parameters
record (logging.LogRecord) – Description
- Returns
Prettified JSON string.
- Return type
- formatException(ei)¶
Format and return the specified exception information as a string.
This default implementation just uses traceback.print_exception()
- formatMessage(record)¶
- formatStack(stack_info)¶
This method is provided as an extension point for specialized formatting of stack information.
The input data is a string as returned from a call to
traceback.print_stack()
, but with the last trailing newline removed.The base implementation just returns the value passed in.
- formatTime(record, datefmt=None)¶
Return the creation time of the specified LogRecord as formatted text.
This method should be called from format() by a formatter which wants to make use of a formatted time. This method can be overridden in formatters to provide for any specific requirement, but the basic behaviour is as follows: if datefmt (a string) is specified, it is used with time.strftime() to format the creation time of the record. Otherwise, an ISO8601-like (or RFC 3339-like) format is used. The resulting string is returned. This function uses a user-configurable function to convert the creation time to a tuple. By default, time.localtime() is used; to change this for a particular formatter instance, set the ‘converter’ attribute to a function with the same signature as time.localtime() or time.gmtime(). To change it for all formatters, for example if you want all logging times to be shown in GMT, set the ‘converter’ attribute in the Formatter class.
- usesTime()¶
Check if the format uses the creation time of the record.
- class parsing.library.logger.JSONStreamWriter(obj, type_=<class 'list'>, level=0)[source]¶
Bases:
object
Context to stream JSON list to file.
- BRACES¶
Open close brace definitions.
- Type
TYPE
Examples
>>> with JSONStreamWriter(sys.stdout, type_=dict) as streamer: ... streamer.write('a', 1) ... streamer.write('b', 2) ... streamer.write('c', 3) { "a": 1, "b": 2, "c": 3 } >>> with JSONStreamWriter(sys.stdout, type_=dict) as streamer: ... streamer.write('a', 1) ... with streamer.write('data', type_=list) as streamer2: ... streamer2.write({0:0, 1:1, 2:2}) ... streamer2.write({3:3, 4:'4'}) ... streamer.write('b', 2) { "a": 1, "data": [ { 0: 0, 1: 1, 2: 2 }, { 3: 3, 4: "4" } ], "b": 2 }
- BRACES = {<class 'list'>: ('[', ']'), <class 'dict'>: ('{', '}')}¶
- write(*args, **kwargs)[source]¶
Write to JSON in streaming fasion.
Picks either write_obj or write_key_value
- Parameters
*args – pass-through
**kwargs – pass-through
- Returns
return value of appropriate write function.
- Raises
ValueError – type_ is not of type list or dict.
Tracker¶
- class parsing.library.tracker.NullTracker(*args, **kwargs)[source]¶
Bases:
parsing.library.tracker.Tracker
Dummy tracker used as an interface placeholder.
- BROADCAST_TYPES = {'DEPARTMENT', 'INSTRUCTOR', 'MODE', 'SCHOOL', 'STATS', 'TERM', 'TIME', 'YEAR'}¶
- add_viewer(viewer, name=None)¶
Add viewer to broadcast queue.
- property department¶
- end()¶
End tracker and report to viewers.
- get_viewer(name)¶
Get viewer by name.
Will return arbitrary match if multiple viewers with same name exist.
- has_viewer(name)¶
Determine if name exists in viewers.
- property instructor¶
- property mode¶
- remove_viewer(name)¶
Remove all viewers that match name.
- Parameters
name (str) – Viewer name to remove.
- property school¶
- start()¶
Start timer of tracker object.
- property stats¶
- property term¶
- property time¶
- property year¶
- class parsing.library.tracker.Tracker[source]¶
Bases:
object
Tracks specified attributes and broadcasts to viewers.
@property attributes are defined for all BROADCAST_TYPES
- BROADCAST_TYPES = {'DEPARTMENT', 'INSTRUCTOR', 'MODE', 'SCHOOL', 'STATS', 'TERM', 'TIME', 'YEAR'}¶
- broadcast(broadcast_type)[source]¶
Broadcast tracker update to viewers.
- Parameters
broadcast_type (str) – message to go along broadcast bus.
- Raises
TrackerError – if broadcast_type is not in BROADCAST_TYPE.
- get_viewer(name)[source]¶
Get viewer by name.
Will return arbitrary match if multiple viewers with same name exist.
- exception parsing.library.tracker.TrackerError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Tracker error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
Viewer¶
- class parsing.library.viewer.ETAProgressBar[source]¶
- class parsing.library.viewer.Hoarder[source]¶
Bases:
parsing.library.viewer.Viewer
Accumulate a log of some properties of the tracker.
- receive(tracker, broadcast_type)[source]¶
Receive an update from a tracker.
Ignore all broadcasts that are not TIME.
- Parameters
tracker (parsing.library.tracker.Tracker) – Tracker receiving update from.
broadcast_type (str) – Broadcast message from tracker.
- class parsing.library.viewer.StatProgressBar(stat_format='', statistics=None)[source]¶
Bases:
parsing.library.viewer.Viewer
Command line progress bar viewer for data pipeline.
- SWITCH_SIZE = 100¶
- class parsing.library.viewer.StatView[source]¶
Bases:
parsing.library.viewer.Viewer
Keeps view of statistics of objects processed pipeline.
- KINDS¶
The kinds of objects that can be tracked. TODO - move this to a shared space w/Validator
- Type
- KINDS = ('course', 'section', 'meeting', 'evaluation', 'offering', 'eval')¶
- LABELS = ('valid', 'created', 'new', 'updated', 'total')¶
- receive(tracker, broadcast_type)[source]¶
Receive an update from a tracker.
Ignore all broadcasts that are not STATUS.
- Parameters
tracker (parsing.library.tracker.Tracker) – Tracker receiving update from.
broadcast_type (str) – Broadcast message from tracker.
- class parsing.library.viewer.TimeDistributionView[source]¶
Bases:
parsing.library.viewer.Viewer
Viewer to analyze time distribution.
Calculates granularity and holds report and 12, 24hr distribution.
- receive(tracker, broadcast_type)[source]¶
Receive an update from a tracker.
Ignore all broadcasts that are not TIME.
- Parameters
tracker (parsing.library.tracker.Tracker) – Tracker receiving update from.
broadcast_type (str) – Broadcast message from tracker.
- class parsing.library.viewer.Timer(format='%(elapsed)s', **kwargs)[source]¶
Bases:
progressbar.widgets.FormatLabel
,progressbar.widgets.TimeSensitiveWidgetBase
Custom timer created to take away ‘Elapsed Time’ string.
- INTERVAL = datetime.timedelta(microseconds=100000)¶
- check_size(progress)¶
- mapping = {'elapsed': ('total_seconds_elapsed', <function format_time>), 'finished': ('end_time', None), 'last_update': ('last_update_time', None), 'max': ('max_value', None), 'seconds': ('seconds_elapsed', None), 'start': ('start_time', None), 'value': ('value', None)}¶
- required_values = []¶
- class parsing.library.viewer.Viewer[source]¶
Bases:
object
A view that is updated via a tracker object broadcast or report.
- exception parsing.library.viewer.ViewerError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Viewer error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
Digestor¶
- class parsing.library.digestor.Absorb(school, meta)[source]¶
Bases:
parsing.library.digestor.DigestionStrategy
Load valid data into Django db.
- static remove_offerings(section_obj)[source]¶
Remove all offerings associated with a section.
- Parameters
section_obj (Section) – Description
- class parsing.library.digestor.Burp(school, meta, output=None)[source]¶
Bases:
parsing.library.digestor.DigestionStrategy
Load valid data into Django db and output diff between input and db data.
- class parsing.library.digestor.DigestionAdapter(school, cached, short_course_weeks_limit)[source]¶
Bases:
object
Converts JSON defititions to model compliant dictionay.
- adapt_course(course)[source]¶
Adapt course for digestion.
- Parameters
course (dict) – course info
- Returns
Adapted course for django object.
- Return type
- Raises
DigestionError – course is None
- adapt_meeting(meeting, section_model=None)[source]¶
Adapt meeting to Django model.
- Parameters
meeting (TYPE) – Description
section_model (None, optional) – Description
- Yields
dict
- Raises
DigestionError – meeting is None.
- adapt_section(section, course_model=None)[source]¶
Adapt section to Django model.
- Parameters
section (TYPE) – Description
course_model (None, optional) – Description
- Returns
formatted section dictionary
- Return type
- Raises
DigestionError – Description
- exception parsing.library.digestor.DigestionError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Digestor error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- class parsing.library.digestor.Digestor(school, meta, tracker=<parsing.library.tracker.NullTracker object>)[source]¶
Bases:
object
Digestor in data pipeline.
- adapter¶
Adapts
- Type
- data¶
The data to be digested.
- Type
TYPE
- strategy¶
Load and/or diff db depending on strategy
- Type
- tracker¶
Description
- MODELS = {'course': <class 'timetable.models.Course'>, 'evaluation': <class 'timetable.models.Evaluation'>, 'offering': <class 'timetable.models.Offering'>, 'section': <class 'timetable.models.Section'>, 'semester': <class 'timetable.models.Semester'>}¶
- digest_course(course)[source]¶
Create course in database from info in json model.
- Returns
django course model object
- digest_meeting(meeting, section_model=None)[source]¶
Create offering in database from info in model map.
- Parameters
section_model – JSON course model object
Return: Offerings as generator
- class parsing.library.digestor.Vommit(output)[source]¶
Bases:
parsing.library.digestor.DigestionStrategy
Output diff between input and db data.
Exceptions¶
- exception parsing.library.exceptions.ParseError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineError
Parser error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.exceptions.ParseJump(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineWarning
Parser exception used for control flow.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.exceptions.ParseWarning(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineWarning
Parser warning class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.exceptions.PipelineError(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineException
Data-pipeline error class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.exceptions.PipelineException(data, *args)[source]¶
Bases:
Exception
Data-pipeline exception class.
- Should never be constructed directly. Use:
PipelineError
PipelineWarning
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
- exception parsing.library.exceptions.PipelineWarning(data, *args)[source]¶
Bases:
parsing.library.exceptions.PipelineException
,UserWarning
Data-pipeline warning class.
- args¶
- with_traceback()¶
Exception.with_traceback(tb) – set self.__traceback__ to tb and return self.
Extractor¶
- class parsing.library.extractor.Extraction(key, container, patterns)¶
Bases:
tuple
- container¶
Alias for field number 1
- count(value, /)¶
Return number of occurrences of value.
- index(value, start=0, stop=9223372036854775807, /)¶
Return first index of value.
Raises ValueError if the value is not present.
- key¶
Alias for field number 0
- patterns¶
Alias for field number 2
Utils¶
- class parsing.library.utils.DotDict(dct)[source]¶
Bases:
dict
Dot notation access for dictionary.
Supports set, get, and delete.
Examples
>>> d = DotDict({'a': 1, 'b': 2, 'c': {'ca': 31}}) >>> d.a, d.b (1, 2) >>> d['a'] 1 >>> d['a'] = 3 >>> d.a, d['b'] (3, 2) >>> d.c.ca, d.c['ca'] (31, 31)
- clear() → None. Remove all items from D.¶
- copy() → a shallow copy of D¶
- fromkeys(value=None, /)¶
Create a new dictionary with keys from iterable and values set to value.
- get(key, default=None, /)¶
Return the value for key if key is in the dictionary, else default.
- items() → a set-like object providing a view on D’s items¶
- keys() → a set-like object providing a view on D’s keys¶
- pop(k[, d]) → v, remove specified key and return the corresponding value.¶
If key is not found, d is returned if given, otherwise KeyError is raised
- popitem()¶
Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
- setdefault(key, default=None, /)¶
Insert key with a value of default if key is not in the dictionary.
Return the value for key if key is in the dictionary, else default.
- update([E, ]**F) → None. Update D from dict/iterable E and F.¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- values() → an object providing a view on D’s values¶
- parsing.library.utils.clean(dirt)[source]¶
Recursively clean json-like object.
- list::
remove None elements
None on empty list
dict
::filter out None valued key, value pairs
None on empty dict
- str::
convert unicode whitespace to ascii
strip extra whitespace
None on empty string
- Parameters
dirt – the object to clean
- Returns
Cleaned dict, cleaned list, cleaned string, or pass-through.
- parsing.library.utils.dict_filter_by_dict(a, b)[source]¶
Filter dictionary a by b.
dict or set Items or keys must be string or regex. Filters at arbitrary depth with regex matching.
- parsing.library.utils.dir_to_dict(path)[source]¶
Recursively create nested dictionary representing directory contents.
- parsing.library.utils.is_short_course(date_start, date_end, short_course_weeks_limit)[source]¶
- Checks whether a course’s duration is longer than a short term
course week limit or not. Limit is defined in the config file for the corresponding school.
- Parameters
{str} -- Any reasonable date value for start date (date_start) –
{str} -- Any reasonable date value for end date (date_end) –
{int} -- Number of weeks a course can be (short_course_weeks_limit) –
as "short term". (defined) –
- Raises
ValidationError – Invalid date input
ValidationError – Invalid date input
- Returns
bool – Defines whether the course is short term or not.
- parsing.library.utils.iterrify(x)[source]¶
Create iterable object if not already.
Will wrap str types in extra iterable eventhough str is iterable.
Examples
>>> for i in iterrify(1): ... print(i) 1 >>> for i in iterrify([1]): ... print(i) 1 >>> for i in iterrify('hello'): ... print(i) 'hello'
- parsing.library.utils.make_list(x=None)[source]¶
Wrap in list if not list already.
If input is None, will return empty list.
- Parameters
x – Input.
- Returns
Input wrapped in list.
- Return type
- parsing.library.utils.safe_cast(val, to_type, default=None)[source]¶
Attempt to cast to specified type or return default.
- Parameters
val – Value to cast.
to_type – Type to cast to.
default (None, optional) – Description
- Returns
Description
- Return type
to_type
- parsing.library.utils.short_date(date)[source]¶
Convert input to %m-%d-%y format. Returns None if input is None.
- Parameters
date (str) – date in reasonable format
- Returns
Date in format %m-%d-%y if the input is not None.
- Return type
- Raises
ParseError – Unparseable time input.
- parsing.library.utils.time24(time)[source]¶
Convert time to 24hr format.
- Parameters
time (str) – time in reasonable format
- Returns
24hr time in format hh:mm
- Return type
- Raises
ParseError – Unparseable time input.
Parsing Models Documentation¶
- class parsing.models.DataUpdate(*args, **kwargs)[source]¶
Stores the date/time that the school’s data was last updated.
Scheduled updates occur when digestion into the database completes.
- school¶
the school code that was updated (e.g. jhu)
- Type
CharField
- semester¶
the semester for the update
- Type
ForeignKey
toSemester
- last_updated¶
the datetime last updated
- Type
DateTimeField
- reason¶
the reason it was updated (default Scheduled Update)
- Type
CharField
- update_type¶
which field was updated
- Type
CharField
- exception DoesNotExist¶
- exception MultipleObjectsReturned¶