-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specs compliance #45
Specs compliance #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good and very nice to have a validation API that we can start consuming in https://github.com/CellMigStandOrg/CMSO-datasets.
A few questions about references and tests. Otherwise, this looks generally in agreement with the current upstream specification. Leaving @pcmasuzzo and @gsergeant to look at this in the context of the Java implementation.
biotracks/createdp.py
Outdated
|
||
import datapackage as dp | ||
from jsontableschema import infer | ||
from .names import OBJECTS_TABLE_NAME, LINKS_TABLE_NAME | ||
|
||
|
||
NAME_PATTERN = re.compile(r"^[a-z0-9_.-]+$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth a reference/link to explain the rationale for this pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in f6f5f9b. The relevant bit is
name
... MUST be lower-case and contain only alphanumeric characters along with ".", "_" or "-" characters
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
OBJECTS_PATH = "objects.csv" | ||
LINKS_PATH = "links.csv" | ||
TRACKS_PATH = "tracks.csv" | ||
JSON = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One test design question about the strategy of generating a JSON consuming biotracks.names
. If these constants were to be modified in a future breaking change, these tests would keep passing although we would probably like to know that the implementation/specification change would affect the validation of existing JSON. Or would that be a different specification upgrade test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to have a separate test. This one tests the functioning of the validation engine itself, not that of individual packages. If we make breaking changes in the future, we should probably add spec version awareness to the validator and a spec upgrade test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems straightforward, very extensive checks! 💯
Will look deeper into this for Java implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is all good!
biotracks/createdp.py
Outdated
|
||
import datapackage as dp | ||
from jsontableschema import infer | ||
from .names import OBJECTS_TABLE_NAME, LINKS_TABLE_NAME | ||
|
||
|
||
NAME_PATTERN = re.compile(r"^[a-z0-9_.-]+$") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Addresses #3.
SUMMARY:
._-
). Since 1) the top-level section is the only "external" source of metadata and 2) the"name"
property is the only one with special requirements, this should ensure that data packages we write are always specs-compliant (barring mistakes in our code, ofc).TESTING:
Check that the build is green and that the newly added tests and checks have been executed in the Travis log.
NOTES:
The proper way to implement CMSO datapackage validation would be to define our own JSON schema for it (for instance, see the fiscal data package schema) and then simply use the standard Frictionless validation engine. However: 1) this is out of the scope of the current milestone and 2) Frictionless specs are currently undergoing a major change, so it's probably best to wait until they stabilize.
We are currently validating using the stable Python data package API (
0.x
), which uses pre-1.0 specs, but we should keep an eye on the upcoming 1.0 specs. In particular, naming the JSON filedatapackage.json
is going to be a requirement (see http://specs.frictionlessdata.io/data-package/).