-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow access to S3 versioned datasets #196
Comments
Glad to see this getting discussed again! I do think it would be useful for certain kinds of builds like Flu and SARS-CoV-2. When I was implementing the |
@trs would you be comfortable with us implementing e.g. |
Off the cuff, I don't think ideally we'd expose S3 version ids at all. It'd be a large encapsulation leak and make it harder to implement something better in the future. If we really need to expose them for expediency (and I don't think we do…?), I'd want to prefix them with a namespace to avoid issues with other forms in the future, e.g. I don't see us leaving S3 here, so am not worried about consequences of that re: exposure of version ids, but I do think the encapsulation break is likely to come back to bite us in other aspects of nextstrain.org. |
Closed by #719 (improvements, such as |
This issue is the expansion on a comment by @joverlee521 in #192 (comment)
This would be interesting for nextstrain.org to explore as an alternate way in which we could access "old" datasets instead of making separate datestamped files. Note that
nextstrain-data
is already using versioning. The best / first usecase for this would be the seasonal flu builds, especially when combined with the search functionality of #192.It's unclear whether this approach would be better than simply keeping datestamped files on the bucket, but if we want to pursue this we could implement a complete solution within the nextstrain.org server with no auspice modifications necessary.
Example
The h1n1-ha-3y dataset from 2020-01-01 could be accessed via https://nextstrain-data.s3.amazonaws.com/flu_seasonal_h1n1pdm_ha_3y.json?versionId=1HEJjqgcCardojwksJF1wIyDyGD3HE_s were the object to be public, or we can do it server side (with credentials) via
Leveraging similar syntax which we already have for accessing datasets on a particular github branch -- e.g. /community/jameshadfield/scratch@test-branch/placentalia we could process URLs such as
/flu/seasonal/h1n1pdm/ha/3y@1HEJjqgcCardojwksJF1wIyDyGD3HE_s
or go one step further and have the server keep track of the upload date of each version to allow URLs such as/flu/seasonal/h1n1pdm/ha/3y@2020-01-01
. Since the nextstrain.org server essentially acts as a middleman between S3 and the client it is possible to interpret such agetDataset
API call and return the correct version of the object (file). There may be some cloudfront modifications needed for this to work.In conjunction with this would be a page listing previous versions similar to the one we currently have for SARS-CoV-2 situation reports, or alternatively we could dynamically modify the
getAvailable
API response such that previous versions appear in auspice's dataset dropdown menu.The text was updated successfully, but these errors were encountered: