29 April 2021

or, Publishing source and intermediate data files

Trevor wrote four long messages in Slack, each with an associated thread, while I was out on vacation:

https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052129000700
https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052136000900
https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052143001100
https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052149001300

I'm jotting down some notes here as I digest them.

+1 for using a prefix (/ncov/gisaid/… and /ncov/open/…) instead of suffix (/ncov/…/gisaid and /ncov/…/open) to indicate source of data (e.g. GISAID vs. open data).
+1 for making data file paths closely mirror the nextstrain.org/ncov/… paths.
+1 for private data filenames on s3://nextstrain-ncov-private/ to match those for the open data on data.nextstrain.org/…/ (s3://nextstrain-data/). Principle is that base path + filename are fixed, but the prefix changes depending on if we can publish them or not.
I know it was my suggestion (right?), but not a huge fan of "raw". Maybe "artifacts" or "builds" or "source-data"?
Are we publishing source data at the single dataset level (e.g. /ncov/open/north-america) or build level (e.g. all /ncov/open/… datasets)? (Or both?)

In the future (or now, if we have a decision for behaviour), we could make this mirroring even tighter, if we define a URL convention, e.g.

nextstrain.org/ncov/open/.raw/metadata.tsv
nextstrain.org/ncov/open/.artifacts/metadata.tsv
nextstrain.org/ncov/open/.source-data/metadata.tsv
nextstrain.org/ncov/open/-/metadata.tsv

nextstrain.org/ncov/open/global/.artifacts/
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/
  → file listing, returned directly or via redirect to CDN'd listing

nextstrain.org/ncov/open/global/.artifacts/metadata.tsv
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/metadata.tsv
  → redirect to CDN path

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021-04-29.md

2021-04-29.md

29 April 2021

or, Publishing source and intermediate data files

Files

2021-04-29.md

Latest commit

History

2021-04-29.md

File metadata and controls

29 April 2021

or, Publishing source and intermediate data files