GTFS Schedule Schema: add filesize and calendar range metadata #525
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds 3 new system generated metadata properties to the bounding_box specification for newly added GTFS schedule sources:
extracted_filesize
: indicates GTFS archive filesize in bytesextracted_calendar_start
: indicates GTFS archive calendar/calendar_dates min dateextracted_calendar_end
: indicates GTFS archive calendar/calendar_dates max dateThese properties function similarly to the existing
extracted_on
metadata property in the schema.These 3 new metadata properties (
extracted_filesize
,extracted_calendar_start
, andextracted_calendar_end
) would be very helpful for end-consumers for certain applications. For example, with respect to end-users understanding general GTFS filesize consider consuming an archive that is 10MB is very different from consuming an archive that is 500MB in both download & processing time; as such it would be very helpful for end-consumers to know this stat flagged ahead-of-time. (One example of an application that would benefit from this change is my android app, Transito, which consumes GTFS indicated from MDB; and I would greatly appreciate the ability to pass for example filesize metadata along to my end-users). Additionally the calendar start/end range would help in historically understanding when the source was updated/added what the original calendar range was. While this PR only adds the 3 new properties; followup PR(s) could address updating existing sources and all new sources would have this metadata by default once applied.In addition to the updated tests, if you just want to quickly test to see what the new format will look like for a sample, you can use for example: