From ccd7c74b479175c186addd8a346d3d70f67049fd Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Fri, 10 Aug 2018 16:35:43 -0700 Subject: [PATCH 01/16] Charter for the Data Store group --- charters/DataStore/charter.md | 72 +++++++++++++++++++++++++++++++++++ 1 file changed, 72 insertions(+) create mode 100644 charters/DataStore/charter.md diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md new file mode 100644 index 00000000..ac2e563f --- /dev/null +++ b/charters/DataStore/charter.md @@ -0,0 +1,72 @@ + +# Data Store + + +## Description +The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at PB scale, with strong public APIs. It provides a simple API for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. + +## Objective +The objective of the Data Store group is to deliver substantively complete functionality on all of the in-scope items listed in this charter. + +## In-scope + +### Interfaces +* DSS data read and write API (PUT bundle, PUT file, GET bundle, GET file) - maintenance and extension of the implementation of the basic data access APIs. +* Checkout service APIs - Provide continuing support for the ability to checkout the data to a local filesystem, or a personal cloud environment +* Collections service APIs - Maintenance and extension of the ability to do basic operations on arbitrary collections of objects in the Data Store. +* API Documentation - Programmatic APIs available for the Data Store include the REST interface and the Python bindings. Documentation and examples will be created for both of these APIs. +* HCA DCP CLI tool - The HCA DCP CLI is a foundational tool for the DCP and its users. All subcomponents in the DCP use the same CLI system. The Data Store team will maintain the infrastructure to support the general CLI architecture as well as the CLI commands relating to the Data Store itself. Other modules such as Upload and Ingest will be responsible for implementing their respective functional components of the CLI + +### Core capabilities +* DSS data model and lifecycle (versioned bundles, etc) - Ongoing support and maintenance of the implementation of the data model. +* Subscriptions/Eventing - Implementation of Data lifecycle web-hooks (new bundle, new file, delete bundle, delete file). The Data Store implementation will move away from the current dependance on Elastic Search Percolate for our event subsystem. Eventing will depend instead on the AWS and GCP cloud infrastructure directly. +* Multi-cloud replication of objects - There are three parts to this: + 1. Maintenance and improvements to the synchronization implementation between AWS and GCP + 2. Extending the cloud support to more vendors (such as Microsoft Azure) + 3. Supporting multiple replicas within a single cloud. +* Support for plug-able indexes - Provide a standard interface for connecting indexing modules to the Data Store. This interface will provide a mechanism to connect indexing subsystems to receive events about the data. + +### Security +* User authentication system implementation +* Data access authorization system implementation +* DevSecOps - implementation of features required for eventual FISMA moderate deployments (authentication, authorization, logging, auditing, etc). +* Operations for DSS - Implement and configure tools to facilitate the operation of the Data Store service in a production environment + +### Community engagement +* Triage and integration of feature requests from the community into the Data Store roadmap. +* Outreach and engagement of the community +* Training +* Hackathons + + +## Out-of-scope +* Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. +* Matrix service API + +## Milestones +* Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot +* EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. +* Future: native GCP support, AuthZ support for controlled-access data, additional scale/hardening, Biosphere requirements, tiered storage, content zones, FISMA moderate capabilities, single-replica deployments, et + +## Roles + +### Project Lead +[Brian O’Connor](mailto:brocono@ucsc.edu) + +### Product Owner +[Kevin Osborn](mailto:kosborn2@ucsc.edu) + +### Technical Lead +[Hannes Schmidt](mailto:hannes@ucsc.edu) + +## Communication +* HumanCellAtlas/data-store : general data store discussions +* HumanCellAtlas/data-store-eng : development discussions + +## Github repositories +* https://github.com/HumanCellAtlas/data-store +* https://github.com/HumanCellAtlas/dcp-cli +* https://github.com/HumanCellAtlas/metadata-api +* https://github.com/chanzuckerberg/cloud-blobstore +* https://github.com/HumanCellAtlas/checksumming_io + From 5270f90e12b792aba4fc4da11248f8c3f8a1956a Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Fri, 10 Aug 2018 17:33:46 -0700 Subject: [PATCH 02/16] Update charter.md --- charters/DataStore/charter.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index ac2e563f..dc2fbe45 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -60,8 +60,11 @@ The objective of the Data Store group is to deliver substantively complete funct [Hannes Schmidt](mailto:hannes@ucsc.edu) ## Communication +### Slack Channels * HumanCellAtlas/data-store : general data store discussions * HumanCellAtlas/data-store-eng : development discussions +### Mailing list(s) +### Discussion Forum(s) ## Github repositories * https://github.com/HumanCellAtlas/data-store From 3cac5873ce527039427d96c0793233b4e8ee1ef8 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Fri, 10 Aug 2018 17:39:39 -0700 Subject: [PATCH 03/16] Update charter.md --- charters/DataStore/charter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index dc2fbe45..7e881430 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -46,7 +46,7 @@ The objective of the Data Store group is to deliver substantively complete funct ## Milestones * Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. -* Future: native GCP support, AuthZ support for controlled-access data, additional scale/hardening, Biosphere requirements, tiered storage, content zones, FISMA moderate capabilities, single-replica deployments, et +* Future: (not in order of precedence) native GCP support, Authorization support for controlled-access data, additional scale/hardening, Biosphere requirements, tiered storage, content zones, FISMA moderate capabilities, single-replica deployments ## Roles From da7356f30aa76f8994912b4ec768c22b5fd488c4 Mon Sep 17 00:00:00 2001 From: Benedict Paten Date: Wed, 15 Aug 2018 09:24:21 -0700 Subject: [PATCH 04/16] Update charter.md Added mention of software contributions from community. --- charters/DataStore/charter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 7e881430..816946c0 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -34,11 +34,11 @@ The objective of the Data Store group is to deliver substantively complete funct ### Community engagement * Triage and integration of feature requests from the community into the Data Store roadmap. +* Review and acceptance process for third party software contributions through pull requests * Outreach and engagement of the community * Training * Hackathons - ## Out-of-scope * Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. * Matrix service API From 7d72684a24771c08fae231eba14b127d01fe2585 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Thu, 16 Aug 2018 17:01:27 -0700 Subject: [PATCH 05/16] Incorporate first round of review --- charters/DataStore/charter.md | 35 ++++++++++++++++------------------- 1 file changed, 16 insertions(+), 19 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 816946c0..5e219c9c 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -3,7 +3,7 @@ ## Description -The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at PB scale, with strong public APIs. It provides a simple API for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. +The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. ## Objective The objective of the Data Store group is to deliver substantively complete functionality on all of the in-scope items listed in this charter. @@ -11,26 +11,26 @@ The objective of the Data Store group is to deliver substantively complete funct ## In-scope ### Interfaces -* DSS data read and write API (PUT bundle, PUT file, GET bundle, GET file) - maintenance and extension of the implementation of the basic data access APIs. -* Checkout service APIs - Provide continuing support for the ability to checkout the data to a local filesystem, or a personal cloud environment -* Collections service APIs - Maintenance and extension of the ability to do basic operations on arbitrary collections of objects in the Data Store. -* API Documentation - Programmatic APIs available for the Data Store include the REST interface and the Python bindings. Documentation and examples will be created for both of these APIs. -* HCA DCP CLI tool - The HCA DCP CLI is a foundational tool for the DCP and its users. All subcomponents in the DCP use the same CLI system. The Data Store team will maintain the infrastructure to support the general CLI architecture as well as the CLI commands relating to the Data Store itself. Other modules such as Upload and Ingest will be responsible for implementing their respective functional components of the CLI +* Data Store read and write APIs for data (PUT bundle, PUT file, GET bundle, GET file) - maintenance and extension of the implementation of the basic data access APIs. +* Maintain and extend the **Checkout service API** which enables data checkout to a local filesystem or a personal cloud environment +* Maintain and extend the **Collections service API** to do basic operations on arbitrary collections of objects in the Data Store. +* Publish API documentation and examples for both the Data Store REST interface and Python bindings. +* The **Command Line Interface** (CLI) is a foundational tool for interacting with the DCP. The Data Store team is responsible for the specific Data Store commands and the maintenance of the infrastructure that allows other services such as Upload and Ingest to integrate their commands into the CLI. ### Core capabilities -* DSS data model and lifecycle (versioned bundles, etc) - Ongoing support and maintenance of the implementation of the data model. -* Subscriptions/Eventing - Implementation of Data lifecycle web-hooks (new bundle, new file, delete bundle, delete file). The Data Store implementation will move away from the current dependance on Elastic Search Percolate for our event subsystem. Eventing will depend instead on the AWS and GCP cloud infrastructure directly. -* Multi-cloud replication of objects - There are three parts to this: +* Maintain and extend the DSS data model and lifecycle (such as versioned bundles) +* Transition Data Store Subscriptions/Eventing services from the current dependence on Elastic Search Percolate to the AWS and GCP cloud infrastructure. +* Multi-cloud replication of objects 1. Maintenance and improvements to the synchronization implementation between AWS and GCP - 2. Extending the cloud support to more vendors (such as Microsoft Azure) - 3. Supporting multiple replicas within a single cloud. -* Support for plug-able indexes - Provide a standard interface for connecting indexing modules to the Data Store. This interface will provide a mechanism to connect indexing subsystems to receive events about the data. + 2. Document interfaces to enable new cloud implementations by 3rd parties + 3. Supporting multiple replicas within a single cloud +* Support for pluggable indexes - Define a standard interface to enable pluggable indexing modules to receive Data Store events ### Security * User authentication system implementation * Data access authorization system implementation * DevSecOps - implementation of features required for eventual FISMA moderate deployments (authentication, authorization, logging, auditing, etc). -* Operations for DSS - Implement and configure tools to facilitate the operation of the Data Store service in a production environment +* Operations for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment ### Community engagement * Triage and integration of feature requests from the community into the Data Store roadmap. @@ -44,9 +44,8 @@ The objective of the Data Store group is to deliver substantively complete funct * Matrix service API ## Milestones -* Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot +* Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. -* Future: (not in order of precedence) native GCP support, Authorization support for controlled-access data, additional scale/hardening, Biosphere requirements, tiered storage, content zones, FISMA moderate capabilities, single-replica deployments ## Roles @@ -61,10 +60,8 @@ The objective of the Data Store group is to deliver substantively complete funct ## Communication ### Slack Channels -* HumanCellAtlas/data-store : general data store discussions -* HumanCellAtlas/data-store-eng : development discussions -### Mailing list(s) -### Discussion Forum(s) +* HumanCellAtlas/data-store: general data store discussions +* HumanCellAtlas/data-store-eng: development discussions ## Github repositories * https://github.com/HumanCellAtlas/data-store From 3006bddc7740f2b663ab19631e13d319b59ffe8b Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Mon, 20 Aug 2018 14:42:52 -0700 Subject: [PATCH 06/16] Updates for more review comments --- charters/DataStore/charter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 5e219c9c..6d161a50 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -6,13 +6,13 @@ The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. ## Objective -The objective of the Data Store group is to deliver substantively complete functionality on all of the in-scope items listed in this charter. +The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. ## In-scope ### Interfaces * Data Store read and write APIs for data (PUT bundle, PUT file, GET bundle, GET file) - maintenance and extension of the implementation of the basic data access APIs. -* Maintain and extend the **Checkout service API** which enables data checkout to a local filesystem or a personal cloud environment +* Maintain and extend the **Checkout service API** which enables data copy to a local filesystem or a personal cloud environment * Maintain and extend the **Collections service API** to do basic operations on arbitrary collections of objects in the Data Store. * Publish API documentation and examples for both the Data Store REST interface and Python bindings. * The **Command Line Interface** (CLI) is a foundational tool for interacting with the DCP. The Data Store team is responsible for the specific Data Store commands and the maintenance of the infrastructure that allows other services such as Upload and Ingest to integrate their commands into the CLI. @@ -36,8 +36,8 @@ The objective of the Data Store group is to deliver substantively complete funct * Triage and integration of feature requests from the community into the Data Store roadmap. * Review and acceptance process for third party software contributions through pull requests * Outreach and engagement of the community -* Training -* Hackathons +* Offer trainings on how to contribute to the Data Store project, or reuse it for a different project. +* Host hackathons for extending the Data Store feature set. ## Out-of-scope * Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. From 89c42fecdf77fc4b63045aa234961288aefb7527 Mon Sep 17 00:00:00 2001 From: Brian Raymor Date: Wed, 22 Aug 2018 20:29:30 -0700 Subject: [PATCH 07/16] Updates to section headings The charter template was updated - #18 --- charters/DataStore/charter.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 6d161a50..bb29a26e 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -5,7 +5,7 @@ ## Description The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. -## Objective +## Objectives The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. ## In-scope @@ -43,7 +43,7 @@ The objective of the Data Store group is to deliver a versioned immutable object * Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. * Matrix service API -## Milestones +## Milestones and Deliverables * Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. From 876edfdde59d2114fef3291282c9d4f4e2e69e57 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Fri, 31 Aug 2018 16:55:28 -0700 Subject: [PATCH 08/16] incorporate comments from Bruce and Brian --- charters/DataStore/charter.md | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index bb29a26e..ccb65e32 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -27,25 +27,26 @@ The objective of the Data Store group is to deliver a versioned immutable object * Support for pluggable indexes - Define a standard interface to enable pluggable indexing modules to receive Data Store events ### Security -* User authentication system implementation -* Data access authorization system implementation -* DevSecOps - implementation of features required for eventual FISMA moderate deployments (authentication, authorization, logging, auditing, etc). +* User authentication system implementation for the Data Store +* Data access authorization system implementation for the Data Store +* DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in forked code bases (authentication, authorization, logging, auditing, etc). * Operations for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment ### Community engagement * Triage and integration of feature requests from the community into the Data Store roadmap. * Review and acceptance process for third party software contributions through pull requests -* Outreach and engagement of the community -* Offer trainings on how to contribute to the Data Store project, or reuse it for a different project. +* Outreach and engagement of the community on use/usability of the APIs +* Provide collaboration with groups to explore what it would take to implement reuse of the Data Store. * Host hackathons for extending the Data Store feature set. ## Out-of-scope * Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. -* Matrix service API +* FISMA moderate certification for the core Data Store code base ## Milestones and Deliverables * Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. +* First half of 2019: Document Data Store interfaces so that the community is enabled to deploy storage on a configurable cloud (AWS or GCP) with the system logic still running in AWS. Also document replication APIs to enable the community to implement new cloud support. ## Roles From ab5833f8cc44cdba4484f4167aa09a5afffb7f10 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Tue, 18 Sep 2018 18:03:06 -0700 Subject: [PATCH 09/16] Incorporate more comments Most of comments from Tony B. --- charters/DataStore/charter.md | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index ccb65e32..fe7153cb 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -5,21 +5,25 @@ ## Description The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. +## Definitions +**Bundle** A bundle is a list of related files along with some very basic metadata such as filenames. +**DCP** The Data Coordination Platform is the name given to the entire system used to ingest, validate, store, analyzes, and make available the datga in the Human Cell Atlas project. + ## Objectives -The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. +The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. ## In-scope ### Interfaces -* Data Store read and write APIs for data (PUT bundle, PUT file, GET bundle, GET file) - maintenance and extension of the implementation of the basic data access APIs. -* Maintain and extend the **Checkout service API** which enables data copy to a local filesystem or a personal cloud environment +* Data Store read and write APIs for data and metadata - maintenance and extension of the implementation of the basic data access APIs. There are two public APIs available, the **REST API** and the **Python bindings**. +* Maintain and extend the **Checkout service API** which enables data copy to a local filesystem or a personal cloud environment. * Maintain and extend the **Collections service API** to do basic operations on arbitrary collections of objects in the Data Store. * Publish API documentation and examples for both the Data Store REST interface and Python bindings. * The **Command Line Interface** (CLI) is a foundational tool for interacting with the DCP. The Data Store team is responsible for the specific Data Store commands and the maintenance of the infrastructure that allows other services such as Upload and Ingest to integrate their commands into the CLI. ### Core capabilities -* Maintain and extend the DSS data model and lifecycle (such as versioned bundles) -* Transition Data Store Subscriptions/Eventing services from the current dependence on Elastic Search Percolate to the AWS and GCP cloud infrastructure. +* Maintain and extend the Data Store data model and data lifecycle. The data model is represented by bundles and files of arbitrary information. The specification for the format, naming, and content of these bundles and files is out of scope for this charter. +* Support for reliable Subscriptions/Eventing services * Multi-cloud replication of objects 1. Maintenance and improvements to the synchronization implementation between AWS and GCP 2. Document interfaces to enable new cloud implementations by 3rd parties @@ -42,11 +46,14 @@ The objective of the Data Store group is to deliver a versioned immutable object ## Out-of-scope * Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. * FISMA moderate certification for the core Data Store code base +* Implementation of other language bindings for the APIs other than Python +* The specification for the format, naming, and content of bundles and files stored in the Data Store. ## Milestones and Deliverables * Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. * First half of 2019: Document Data Store interfaces so that the community is enabled to deploy storage on a configurable cloud (AWS or GCP) with the system logic still running in AWS. Also document replication APIs to enable the community to implement new cloud support. +* First half of 2019: Transition Data Store Subscriptions/Eventing services from the current dependence on Elastic Search Percolate to the AWS and GCP cloud infrastructure. ## Roles From e53870a7e6c6c6d8364323116d401d294a4fd733 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Sun, 23 Sep 2018 16:50:27 -0700 Subject: [PATCH 10/16] Added GDPR statement and language about data model --- charters/DataStore/charter.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index fe7153cb..ef602b9a 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -7,6 +7,7 @@ The Data Store is a scientific data sharing/publishing/distribution framework, p ## Definitions **Bundle** A bundle is a list of related files along with some very basic metadata such as filenames. + **DCP** The Data Coordination Platform is the name given to the entire system used to ingest, validate, store, analyzes, and make available the datga in the Human Cell Atlas project. ## Objectives @@ -22,7 +23,7 @@ The objective of the Data Store group is to deliver a versioned immutable object * The **Command Line Interface** (CLI) is a foundational tool for interacting with the DCP. The Data Store team is responsible for the specific Data Store commands and the maintenance of the infrastructure that allows other services such as Upload and Ingest to integrate their commands into the CLI. ### Core capabilities -* Maintain and extend the Data Store data model and data lifecycle. The data model is represented by bundles and files of arbitrary information. The specification for the format, naming, and content of these bundles and files is out of scope for this charter. +* Maintain and extend the Data Store data model and data lifecycle. The data model is represented by bundles and files of arbitrary information. The Data Store team will own the process of collecting new bundle requirements from users and assessing them against the datastore design before sending them to the metadata team for the definition of a new spec. The Data Store owns the bundle use cases and bundle type definitions, while the precise specifications will be negotiated between the Metadata and other teams. * Support for reliable Subscriptions/Eventing services * Multi-cloud replication of objects 1. Maintenance and improvements to the synchronization implementation between AWS and GCP @@ -33,7 +34,7 @@ The objective of the Data Store group is to deliver a versioned immutable object ### Security * User authentication system implementation for the Data Store * Data access authorization system implementation for the Data Store -* DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in forked code bases (authentication, authorization, logging, auditing, etc). +* DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in forked code bases (authentication, authorization, logging, auditing, etc). The authentication and authorization system will support all rights of data subjects as defined in [GDPR](https://gdpr-info.eu/) * Operations for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment ### Community engagement From b7bf7997826290e66e2c6df59275fc17b3db5565 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Mon, 24 Sep 2018 21:24:36 -0400 Subject: [PATCH 11/16] change of technical lead --- charters/DataStore/charter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index ef602b9a..096e9441 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -65,7 +65,7 @@ The objective of the Data Store group is to deliver a versioned immutable object [Kevin Osborn](mailto:kosborn2@ucsc.edu) ### Technical Lead -[Hannes Schmidt](mailto:hannes@ucsc.edu) +[Brian Hannafious](mailto:bhannafi@ucsc.edu) ## Communication ### Slack Channels From 0da569ca50cbec7992dd54ce9b63d11260a3b37f Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Tue, 25 Sep 2018 10:33:13 -0400 Subject: [PATCH 12/16] group review adjustments comments from group review --- charters/DataStore/charter.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 096e9441..228e4a41 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -3,12 +3,12 @@ ## Description -The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as AWS and GCP. +The Data Store is a scientific data sharing/publishing/distribution framework, providing file/bundle management on multiple clouds at petabyte-scale. It defines public APIs for storage, retrieval, and subscription to events that functions transparently across multiple cloud systems such as Amazon Web Service and Google Cloud Platform. The Data Store is designed to be reused in multiple projects, with HCA being the first user. ## Definitions **Bundle** A bundle is a list of related files along with some very basic metadata such as filenames. -**DCP** The Data Coordination Platform is the name given to the entire system used to ingest, validate, store, analyzes, and make available the datga in the Human Cell Atlas project. +**DCP** The Data Coordination Platform is the name given to the entire system used to ingest, validate, store, analyzes, and make available the data in the Human Cell Atlas project. ## Objectives The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. @@ -34,7 +34,7 @@ The objective of the Data Store group is to deliver a versioned immutable object ### Security * User authentication system implementation for the Data Store * Data access authorization system implementation for the Data Store -* DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in forked code bases (authentication, authorization, logging, auditing, etc). The authentication and authorization system will support all rights of data subjects as defined in [GDPR](https://gdpr-info.eu/) +* DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in community reuse code bases (authentication, authorization, logging, auditing, etc). The authentication and authorization system will support all rights of data subjects as defined in [GDPR](https://gdpr-info.eu/) * Operations for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment ### Community engagement @@ -45,7 +45,7 @@ The objective of the Data Store group is to deliver a versioned immutable object * Host hackathons for extending the Data Store feature set. ## Out-of-scope -* Other index/query methods/engines - we should implement these as stand-alone projects against modular index/query API. +* Other index/query methods/engines - we should implement these as stand-alone projects against a modular index/query API. * FISMA moderate certification for the core Data Store code base * Implementation of other language bindings for the APIs other than Python * The specification for the format, naming, and content of bundles and files stored in the Data Store. From cf56666eeadf0316643850055f9093775480cb48 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Tue, 25 Sep 2018 10:56:12 -0400 Subject: [PATCH 13/16] spelling fix --- charters/DataStore/charter.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index 228e4a41..ad248161 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -11,7 +11,7 @@ The Data Store is a scientific data sharing/publishing/distribution framework, p **DCP** The Data Coordination Platform is the name given to the entire system used to ingest, validate, store, analyzes, and make available the data in the Human Cell Atlas project. ## Objectives -The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programatic interfaces as well as a command line interface. +The objective of the Data Store group is to deliver a versioned immutable object based data repository that is highly available and scalable. Data will be replicated to at least two commercial clouds (Amazon and Google). Data will be accessible through a variety of programmatic interfaces as well as a command line interface. ## In-scope From 3a538089e765ee0231fae4f6bd5b8c541b9e135e Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Fri, 28 Sep 2018 11:16:18 -0700 Subject: [PATCH 14/16] Update to add group email addresses --- charters/DataStore/charter.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index ad248161..d0a1e18d 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -1,5 +1,5 @@ -# Data Store +# [Data Store](mailto:dss-team@data.humancellatlas.org) ## Description @@ -72,6 +72,9 @@ The objective of the Data Store group is to deliver a versioned immutable object * HumanCellAtlas/data-store: general data store discussions * HumanCellAtlas/data-store-eng: development discussions +### Mailing List +Team email: dss-team@data.humancellatlas.org + ## Github repositories * https://github.com/HumanCellAtlas/data-store * https://github.com/HumanCellAtlas/dcp-cli From 381cea79849e4c08522a2eaacbbb47a56d3e4df8 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Wed, 3 Oct 2018 11:23:01 -0700 Subject: [PATCH 15/16] Add DevOps responsibility --- charters/DataStore/charter.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index d0a1e18d..d3303af1 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -35,7 +35,8 @@ The objective of the Data Store group is to deliver a versioned immutable object * User authentication system implementation for the Data Store * Data access authorization system implementation for the Data Store * DevSecOps - implementation of features required to the core Data Store code to support FISMA moderate capabilities in community reuse code bases (authentication, authorization, logging, auditing, etc). The authentication and authorization system will support all rights of data subjects as defined in [GDPR](https://gdpr-info.eu/) -* Operations for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment +* Operation Tooling for Data Store - Implement and configure tools to facilitate the operation of the Data Store service in a production environment +* Operations for Data Store - Day to day operation of the Data Store production deployment including but not limited to monitoring of errors and health of intrinsic components, fielding help requests, monitoring security, and helping with the roll-out of new data into the deployment. ### Community engagement * Triage and integration of feature requests from the community into the Data Store roadmap. From e421c11cf291723c27acede380e3532a9feb9951 Mon Sep 17 00:00:00 2001 From: Kevin Osborn Date: Wed, 3 Oct 2018 11:51:38 -0700 Subject: [PATCH 16/16] Removed completed milestone --- charters/DataStore/charter.md | 1 - 1 file changed, 1 deletion(-) diff --git a/charters/DataStore/charter.md b/charters/DataStore/charter.md index d3303af1..9d95b477 100644 --- a/charters/DataStore/charter.md +++ b/charters/DataStore/charter.md @@ -52,7 +52,6 @@ The objective of the Data Store group is to deliver a versioned immutable object * The specification for the format, naming, and content of bundles and files stored in the Data Store. ## Milestones and Deliverables -* Mid-2018: 1000 bundle test scale, deploy as part of HCA DCP Pilot * EOY 2018: add checkout, collections, improved scaling/hardening, generic events to support stand-alone indexers, additional gaps identified in HCA DCP Pilot. * First half of 2019: Document Data Store interfaces so that the community is enabled to deploy storage on a configurable cloud (AWS or GCP) with the system logic still running in AWS. Also document replication APIs to enable the community to implement new cloud support. * First half of 2019: Transition Data Store Subscriptions/Eventing services from the current dependence on Elastic Search Percolate to the AWS and GCP cloud infrastructure.