Update introduction and references

ome · Nov 20, 2020 · 4cc2535 · 4cc2535
1 parent 7312c3c
commit 4cc2535
Showing 1 changed file with 79 additions and 43 deletions.
diff --git a/index.bs b/index.bs
@@ -10,7 +10,9 @@ Repository: https://github.com/joshmoore/ngff
 Issue Tracking: Forums https://forum.image.sc/tag/ome-ngff
 Logo: http://www.openmicroscopy.org/img/logos/ome-logomark.svg
 Local Boilerplate: header yes
+Local Boilerplate: copyright yes
 Boilerplate: style-darkmode off
+Markup Shorthands: markdown yes
 Editor: Josh Moore, Open Microscopy Environment (OME) https://www.openmicroscopy.org
 Abstract: This document contains next-generation file format (NGFF)
 Abstract: specifications for storing bioimaging data in the cloud.
@@ -28,11 +30,11 @@ larger, preciser spatial measurements is unfortunately at odds with our ability
 to structure and share those measurements with others. During a global pandemic
 more than ever, we believe fervently that global, collaborative discovery as
 opposed to the post-publication, "data-on-request" mode of operation is the
-path forward. Bioimages should be shareable via open and commercial cloud
+path forward. Bioimaging data should be shareable via open and commercial cloud
 resources without the need to download entire datasets.
 
 At the moment, that is not the norm. The plethora of data formats produced by
-imaging systems are ill-suited to the remote sharing. Individual scientists
+imaging systems are ill-suited to remote sharing. Individual scientists
 typically lack the infrastructure they need to host these data themselves. When
 they acquire images from elsewhere, time-consuming translations and data
 cleaning are needed to interpret findings. Those same costs are multiplied when
@@ -41,53 +43,63 @@ factor before publication is possible. Without a common effort, each lab or
 resource is left building the tools they need and maintaining that
 infrastructure often without dedicated funding.
 
-This document assumes that there are three keys to a workable solution:
+This document defines a specification for bioimaging data to make it possible
+to enable the conversion of proprietary formats into a common, cloud-ready one.
+Such next-generation file formats layout data so that individual portions, or
+"chunks", of large data are reference-able eliminating the need to download
+entire datasets.
 
-1. Converting all data out of proprietary formats rather than trying to translate data on every access.
-2. Chunking the data so that manageable areas of large data are reference-able online rather than downloading them entirely.
-3. Collaborating on a small number of container formats and conventions for metadata rather than developing new versions to meet each individual requirement.
 
-This document specifies one layout for images within Zarr files. The APIs and
-scripts provided by this repository will support one or more versions of this
-file, but they should all be considered internal investigations, not intended
-for public re-use.
-
-Why "next generation"? {#ngff}
-------------------------------
+Why "<dfn export="true"><abbr title="Next-generation file-format">NGFF</abbr></dfn>"? {#why-ngff}
+-------------------------------------------------------------------------------------------------
 
 A short description of what is needed for an imaging format is "a hierarchy
 of n-dimensional (dense) arrays with metadata". This combination of features
-is certainly provided by <dfn><abbr title="Hierarchical Data Format 5">HDF5</abbr></dfn>
+is certainly provided by <dfn export="true"><abbr title="Hierarchical Data Format 5">HDF5</abbr></dfn>
 from the <a href="https://www.hdfgroup.org">HDF Group</a>, which a number of
 bioimaging formats do use. HDF5 and other larger binary structures, however,
-are ill-suited for storage in the cloud where accessing individual segments,
-or "chunks", of data by name rather than seeking through a large file is at
-the heart of parallelization.
+are ill-suited for storage in the cloud where accessing individual chunks
+of data by name rather than seeking through a large file is at the heart of
+parallelization.
 
 As a result, a number of formats have been developed more recently which provide
 the basic data structure of an HDF5 file, but do so in a more cloud-friendly way.
-
-<!--
-Zarr {#zarr} 
-
-N5 {#n5}
-
-Eventually, of course, these files will no longer be next-generation and we will
-need to change the name ...
-
-
-The specification 
-https://github.com/saalfeldlab/n5#file-system-specification
-
-https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html
-assumes a file format
--->
+In the [PyData](https://pydata.org/) community, the Zarr [[zarr]] format was developed
+for easily storing collections of [NumPy](https://numpy.org/) arrays. In the
+[ImageJ](https://imagej.net/) community, N5 [[n5]] was developed to work around
+the limitations of HDF5 ("N5" was originally short for "Not-HDF5").
+Both of these formats permit storing individual chunks of data either locally in
+separate files or in cloud-based object stores as separate keys.
+
+A [current effort](https://zarr-specs.readthedocs.io/en/core-protocol-v3.0-dev/protocol/core/v3.0.html)
+is underway to unify the two similar specifications to provide a single binary
+specification. The editor's draft will soon be entering a [request for comments (RFC)](https://github.com/zarr-developers/zarr-specs/issues/101) phase with the goal of having a first version early in 2021. As that
+process comes to an end, this document will be updated.
+
+OME-NGFF {#ome-ngff}
+--------------------
+
+The conventions and specifications defined in this document are designed to
+enable next-generation file formats to represent the same bioimaging data
+that can be represented in \[OME-TIFF](http://www.openmicroscopy.org/ome-files/)
+and beyond. However, the conventions will also be usable by HDF5 and other sufficiently advanced
+binary containers. Eventually, we hope, the moniker "next-generation" will no longer be
+applicable, and this will simply be the most efficient, common, and useful representation
+of bioimaging data, whether during acquisition or sharing in the cloud.
+
+Note: The following text makes use of OME-Zarr [[ome-zarr-py]], the current prototype implementation,
+for all examples.
 
 On-disk (or in-cloud) layout {#on-disk}
 =======================================
 
-```
+An overview of the layout of an OME-Zarr fileset should make
+understanding the following metadata sections easier. The hierarchy
+is represented here as it would appear locally but could equally
+be stored on a web server to be accessed via HTTP or in object storage
+like S3 or GCS.
 
+```
 .                             # Root folder, potentially in S3,
 │                             # with a flat list of images by image ID.
 │
@@ -130,8 +142,6 @@ On-disk (or in-cloud) layout {#on-disk}
                 ├── 0         # Each multiscale level is stored as a separate Zarr array, as above, but only integer values
                 │   ...       # are supported.
                 └── n
-
-
 ```
 
 Metadata {#metadata}
@@ -312,6 +322,17 @@ above).
 
 <pre class="biblio">
 {
+  "blogNov2020": {
+    "href": "https://blog.openmicroscopy.org/file-formats/community/2020/11/04/zarr-data/",
+    "title": "Public OME-Zarr data (Nov. 2020)",
+    "authors": [
+      "OME Team"
+    ],
+    "status": "Informational",
+    "publisher": "OME",
+    "id": "blogNov2020",
+    "date": "04 November 2020"
+  },
   "imagesc26952": {
     "href": "https://forum.image.sc/t/ome-s-position-regarding-file-formats/26952",
     "title": "OME’s position regarding file formats",
@@ -323,16 +344,31 @@ above).
     "id": "imagesc26952",
     "date": "19 June 2020"
   },
-  "blogNov2020": {
-    "href": "https://blog.openmicroscopy.org/file-formats/community/2020/11/04/zarr-data/",
-    "title": "Public OME-Zarr data (Nov. 2020)",
+  "n5": {
+    "id": "n5",
+    "href": "https://github.com/saalfeldlab/n5/issues/62",
+    "title": "N5---a scalable Java API for hierarchies of chunked n-dimensional tensors and structured meta-data",
+    "status": "Informational",
     "authors": [
-      "OME Team"
+      "John A. Bogovic",
+      "Igor Pisarev",
+      "Philipp Hanslovsky",
+      "Neil Thistlethwaite",
+      "Stephan Saalfeld"
     ],
+    "date": "2020"
+  },
+  "ome-zarr-py": {
+    "id": "ome-zarr-py",
+    "href": "https://doi.org/10.5281/zenodo.4113931",
+    "title": "ome-zarr-py: Experimental implementation of next-generation file format (NGFF) specifications for storing bioimaging data in the cloud.",
     "status": "Informational",
-    "publisher": "OME",
-    "id": "blogNov2020",
-    "date": "04 November 2020"
+    "publisher": "Zenodo",
+    "authors": [
+      "OME",
+      "et al"
+    ],
+    "date": "06 October 2020"
   },
   "zarr": {
     "id": "zarr",