Trevor wrote four long messages in Slack, each with an associated thread, while I was out on vacation:
- https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052129000700
- https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052136000900
- https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052143001100
- https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052149001300
I'm jotting down some notes here as I digest them.
-
+1 for using a prefix (/ncov/gisaid/… and /ncov/open/…) instead of suffix (/ncov/…/gisaid and /ncov/…/open) to indicate source of data (e.g. GISAID vs. open data).
-
+1 for making data file paths closely mirror the nextstrain.org/ncov/… paths.
-
+1 for private data filenames on s3://nextstrain-ncov-private/ to match those for the open data on data.nextstrain.org/…/ (s3://nextstrain-data/). Principle is that base path + filename are fixed, but the prefix changes depending on if we can publish them or not.
-
I know it was my suggestion (right?), but not a huge fan of "raw". Maybe "artifacts" or "builds" or "source-data"?
-
Are we publishing source data at the single dataset level (e.g. /ncov/open/north-america) or build level (e.g. all /ncov/open/… datasets)? (Or both?)
In the future (or now, if we have a decision for behaviour), we could make this mirroring even tighter, if we define a URL convention, e.g.
nextstrain.org/ncov/open/.raw/metadata.tsv
nextstrain.org/ncov/open/.artifacts/metadata.tsv
nextstrain.org/ncov/open/.source-data/metadata.tsv
nextstrain.org/ncov/open/-/metadata.tsv
nextstrain.org/ncov/open/global/.artifacts/
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/
→ file listing, returned directly or via redirect to CDN'd listing
nextstrain.org/ncov/open/global/.artifacts/metadata.tsv
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/metadata.tsv
→ redirect to CDN path