Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure code, Data Package 2.0, Documentation Site, etc. #84

Merged
merged 23 commits into from
Jul 19, 2024

Conversation

dtemkin-volpe
Copy link
Collaborator

Sorry for the big pull request! This request has a few changes, including:

  • Align with the Data Package 2.0 specification
    • The main changes are adding the fieldsMatch entry to allow for end-users to have additional columns (eg. for relevant notes), clarify that every resource is a table, and add $schema (which replaces profile), and changes the enum constraints to use the new categories field (this makes the documentation easier to read, but changes the DB's schema from using VARCHAR to TEXT, so feedback on whether I should change this back would be appreciated)
    • I'm hoping the frictionless-py package is updated soon to take advantage of the new features, but for right now, everything is (mostly) backwards compatible.
  • Reformat the scripts code to ensure that code is not run multiple times and to make more efficient (instead of just having disparate functions, put everything in a class and import that between files)
  • Add an empty string to the missingValues field for every table, mainly for CSV processors to recognize that an empty column is a null value.
  • Fix Specification Clarification for node_id and zone_id in node.csv #83 by adding an id_type entry in the config table, that must either be "string" or "integer". This would tell any software using GMNS how to interpret the id fields in other tables.
  • Fix errors in documentation files
  • Create a MkDocs site that uses the automatically generated documentation that will be hosted at the repository's GitHub Pages site
  • Add Dependabot version updates, for Python packages and GitHub Actions

Let me know if there are any issues with this or if anyone wants any changes.

@dtemkin-volpe dtemkin-volpe self-assigned this Jul 17, 2024
@dtemkin-volpe dtemkin-volpe requested a review from a team as a code owner July 17, 2024 22:08
Test changing the id type
@ssmith55
Copy link
Collaborator

From what I can see, it looks good. As a test, I'd like to run the workflows. Something like:
Test #1: change the ID from text to integer. See what happens to the markdown and generated sql. What field type for ID appears in the markdown (ID to match the json, or would it be integer or test?)
Test #2: change something else. Make sure we have not broken what we have done before
Finally, in our Cambridge Intersection example, double check that the field_type in the config file is consistent with the ID field types in the actual data.

@dtemkin-volpe
Copy link
Collaborator Author

I think the idea was that people could define what types the id fields would be in their config.csv file, rather than in the json spec files (so the markdown documentation or generated SQL wouldn't change). Because the config.csv file isn't really required, the id fields are defined as an any type in the spec, and downstream software could then enforce whatever is set in the config file. I enabled the workflow on my end, let me try changing something and make sure everything works still.

@ssmith55
Copy link
Collaborator

Yes, that makes sense. Thank you. Maybe we can create two versions of the template SQLite database, one with INT keys and the other with TEXT keys.

@ssmith55
Copy link
Collaborator

I did a little test, and all seems to be well. Diego, I'm ok with merging if you feel it is ready. Thanks.

@dtemkin-volpe dtemkin-volpe merged commit f3bd462 into zephyr-data-specs:develop Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants