Specs compliance #45

simleo · 2017-07-07T17:28:55Z

Addresses #3.

SUMMARY:

Adds a validation module for CMSO-specific tabular data packages. The validation process first checks that the data package is a valid tabular data package via the Frictionless API, then performs several additional checks to verify that it's also a valid CMSO data package (and implicitly defines our specs, since they have not been formalized yet).
Checks that the package name provided via the top-level configuration section respects the Frictionless constraints (lowercase alphanumeric characters plus ._-). Since 1) the top-level section is the only "external" source of metadata and 2) the "name" property is the only one with special requirements, this should ensure that data packages we write are always specs-compliant (barring mistakes in our code, ofc).
Adds a validation script and a command that validates all data packages in our examples to the Docker build (and thus to the Travis build).

TESTING:

Check that the build is green and that the newly added tests and checks have been executed in the Travis log.

NOTES:

The proper way to implement CMSO datapackage validation would be to define our own JSON schema for it (for instance, see the fiscal data package schema) and then simply use the standard Frictionless validation engine. However: 1) this is out of the scope of the current milestone and 2) Frictionless specs are currently undergoing a major change, so it's probably best to wait until they stabilize.
We are currently validating using the stable Python data package API (0.x), which uses pre-1.0 specs, but we should keep an eye on the upcoming 1.0 specs. In particular, naming the JSON file datapackage.json is going to be a requirement (see http://specs.frictionlessdata.io/data-package/).

sbesson

Overall looks good and very nice to have a validation API that we can start consuming in https://github.com/CellMigStandOrg/CMSO-datasets.

A few questions about references and tests. Otherwise, this looks generally in agreement with the current upstream specification. Leaving @pcmasuzzo and @gsergeant to look at this in the context of the Java implementation.

sbesson · 2017-07-10T22:02:04Z

biotracks/createdp.py


 import datapackage as dp
 from jsontableschema import infer
 from .names import OBJECTS_TABLE_NAME, LINKS_TABLE_NAME


+NAME_PATTERN = re.compile(r"^[a-z0-9_.-]+$")


Worth a reference/link to explain the rationale for this pattern?

Added in f6f5f9b. The relevant bit is

name ... MUST be lower-case and contain only alphanumeric characters along with ".", "_" or "-" characters

sbesson · 2017-07-11T14:19:53Z

tests/test_validation.py

+OBJECTS_PATH = "objects.csv"
+LINKS_PATH = "links.csv"
+TRACKS_PATH = "tracks.csv"
+JSON = {


One test design question about the strategy of generating a JSON consuming biotracks.names. If these constants were to be modified in a future breaking change, these tests would keep passing although we would probably like to know that the implementation/specification change would affect the validation of existing JSON. Or would that be a different specification upgrade test?

I think it's better to have a separate test. This one tests the functioning of the validation engine itself, not that of individual packages. If we make breaking changes in the future, we should probably add spec version awareness to the validator and a spec upgrade test.

gsergeant

Seems straightforward, very extensive checks! 💯
Will look deeper into this for Java implementation.

pcmasuzzo

I think this is all good!

pcmasuzzo · 2017-07-13T07:37:07Z

biotracks/createdp.py


 import datapackage as dp
 from jsontableschema import infer
 from .names import OBJECTS_TABLE_NAME, LINKS_TABLE_NAME


+NAME_PATTERN = re.compile(r"^[a-z0-9_.-]+$")


simleo added 4 commits July 7, 2017 18:20

add validation module and script

9c77e67

ensure validity of data package name

dc7c7fd

pep8 fixes

67800f0

add validation checks to Docker build

0d178b6

simleo force-pushed the specs_compliance branch from 7dcb221 to 0d178b6 Compare July 7, 2017 17:53

simleo added 2 commits July 8, 2017 09:22

validate_dpkg: log successful validation at debug level

b194a8f

test validation of optional tracks resource

380af46

simleo requested review from pcmasuzzo and gsergeant July 10, 2017 08:44

sbesson reviewed Jul 11, 2017

View reviewed changes

createdp: added link to specs as a comment to NAME_PATTERN

f6f5f9b

gsergeant reviewed Jul 12, 2017

View reviewed changes

pcmasuzzo approved these changes Jul 13, 2017

View reviewed changes

simleo merged commit 8bafe8f into CellMigStandOrg:master Jul 13, 2017

simleo deleted the specs_compliance branch July 13, 2017 09:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Specs compliance #45

Specs compliance #45

simleo commented Jul 7, 2017 •

edited

Loading

sbesson left a comment

sbesson Jul 10, 2017

simleo Jul 11, 2017 •

edited

Loading

pcmasuzzo Jul 13, 2017

sbesson Jul 11, 2017

simleo Jul 11, 2017

gsergeant left a comment

pcmasuzzo left a comment

pcmasuzzo Jul 13, 2017

Specs compliance #45

Specs compliance #45

Conversation

simleo commented Jul 7, 2017 • edited Loading

sbesson left a comment

Choose a reason for hiding this comment

sbesson Jul 10, 2017

Choose a reason for hiding this comment

simleo Jul 11, 2017 • edited Loading

Choose a reason for hiding this comment

pcmasuzzo Jul 13, 2017

Choose a reason for hiding this comment

sbesson Jul 11, 2017

Choose a reason for hiding this comment

simleo Jul 11, 2017

Choose a reason for hiding this comment

gsergeant left a comment

Choose a reason for hiding this comment

pcmasuzzo left a comment

Choose a reason for hiding this comment

pcmasuzzo Jul 13, 2017

Choose a reason for hiding this comment

simleo commented Jul 7, 2017 •

edited

Loading

simleo Jul 11, 2017 •

edited

Loading