Skip to content

Latest commit

 

History

History
46 lines (34 loc) · 1.99 KB

2021-04-29.md

File metadata and controls

46 lines (34 loc) · 1.99 KB

29 April 2021

or, Publishing source and intermediate data files

Trevor wrote four long messages in Slack, each with an associated thread, while I was out on vacation:

  1. https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052129000700
  2. https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052136000900
  3. https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052143001100
  4. https://bedfordlab.slack.com/archives/C01LCTT7JNN/p1619052149001300

I'm jotting down some notes here as I digest them.

  • +1 for using a prefix (/ncov/gisaid/… and /ncov/open/…) instead of suffix (/ncov/…/gisaid and /ncov/…/open) to indicate source of data (e.g. GISAID vs. open data).

  • +1 for making data file paths closely mirror the nextstrain.org/ncov/… paths.

  • +1 for private data filenames on s3://nextstrain-ncov-private/ to match those for the open data on data.nextstrain.org/…/ (s3://nextstrain-data/). Principle is that base path + filename are fixed, but the prefix changes depending on if we can publish them or not.

  • I know it was my suggestion (right?), but not a huge fan of "raw". Maybe "artifacts" or "builds" or "source-data"?

  • Are we publishing source data at the single dataset level (e.g. /ncov/open/north-america) or build level (e.g. all /ncov/open/… datasets)? (Or both?)

In the future (or now, if we have a decision for behaviour), we could make this mirroring even tighter, if we define a URL convention, e.g.

nextstrain.org/ncov/open/.raw/metadata.tsv
nextstrain.org/ncov/open/.artifacts/metadata.tsv
nextstrain.org/ncov/open/.source-data/metadata.tsv
nextstrain.org/ncov/open/-/metadata.tsv

nextstrain.org/ncov/open/global/.artifacts/
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/
  → file listing, returned directly or via redirect to CDN'd listing

nextstrain.org/ncov/open/global/.artifacts/metadata.tsv
nextstrain.org/flu/seasonal/h3n2/ha/2y/.artifacts/metadata.tsv
  → redirect to CDN path