Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand "development process" section in README.md #664

Open
dsmedia opened this issue Jan 12, 2025 · 3 comments
Open

Expand "development process" section in README.md #664

dsmedia opened this issue Jan 12, 2025 · 3 comments

Comments

@dsmedia
Copy link
Collaborator

dsmedia commented Jan 12, 2025

The "Development process" section in README.md would benefit from better documentation for contributors working to add or update datasets or dataset metadata. While we have robust technical infrastructure (build_datapackage.py, taplo.toml etc.) for handling metadata, the documentation doesn't adequately explain how to add metadata or verify linting (related to #663). README.md might also be a good place for a high-level description of the dataset metadata workflow.

There is already some existing documentation in build_datapackage.py, and we might consider referencing some of this in README.md. Also, the taplo checks / formatting process might also be included.

From README.md:

Development process

Install dependencies with npm install.

@dangotbanned would you have any suggestions and/or want to take a crack at this?

@dsmedia dsmedia changed the title Add "Contributing Data" section to README.md Expand "development process" section in README.md Jan 12, 2025
@dangotbanned
Copy link
Member

dangotbanned commented Jan 12, 2025

@dsmedia sounds like a good idea to me.
I'm +1 on this, but I'm shifting my focus to closing off (vega/altair#3631) for now

Suggestions

I've recently updated the altair dev docs, so there may be bits that can be adapted?

Regarding taplo, one part that would be different is that you'd install it like:

uv sync --dev

That picks up this table:

[dependency-groups]
dev = ["ipython[kernel]>=8.30.0", "ruff>=0.8.2", "taplo>=0.9.3"]


Another thing I think that would be worth considering is trying to keep docs close to the actual functionality.
E.g. instead of writing about each CLI tool in README.md, add a task runner that supports docs.
taskipy is what I used in altair, but maybe npm can be used this way?

"scripts": {
"prebuild": "./scripts/make-url-index.sh > src/urls.ts && ./scripts/build_datapackage.py",
"build": "rollup -c",
"github": "python scripts/github.py",
"release": "release-it"

The main benefit is when the process inevitably changes - there's no concern about updating things in multiple places

@dsmedia
Copy link
Collaborator Author

dsmedia commented Jan 13, 2025

I see. So if I understand correctly, you're suggesting that instead of documenting in README.md that it is required to 1) format the TOML files (using uvx taplo fmt) and 2) generate the datapackage files (using python scripts/build_datapackage.py), we instead document a single command like npm run update-metadata, which could be implemeneted in package.json as:

"scripts": {
  // ... other scripts
"build-metadata": "python scripts/build_datapackage.py", // Generates datapackage.json
"format-metadata": "uvx taplo fmt", // Formats TOML metadata files
"update-metadata": "npm run format-metadata && npm run build-metadata", // Combines both
}

Do I have that right (and is it just these two steps, in that order)?

@dangotbanned
Copy link
Member

dangotbanned commented Jan 13, 2025

@dsmedia

Commands

These are all of the commands that might be useful, in order.
The headings are just grouping functionality - not suggestions for names of scripts/tasks

python env

Related
Probably doesn't make sense to have as a script, since this would be for onboarding

# (Direct them to `uv` docs if they need to install)
uv self update
uv python install 3.12
uv venv -p 3.12

Maybe have a script, since you might run once per PR:

uv sync --dev

Lint/formatting

These you'd want to do before any commit gets pushed and before running any .py scripts:

uv run taplo fmt
uv run ruff check
uv run ruff format

.py scripts

flights.py is probably going to be a rare one to use, but would come before build_datapackage.py

uv run scripts/flights.py
uv run scripts/build_datapackage.py

Naming

I'd probably go for things like:

  • sync-py
  • fmt-toml
  • lint-py
  • fmt-py
  • flights
  • build-datapackage

Documenting

I did some reading about npm scripts but it seems like docs/help aren't part of the spec.

One issue might be that package.json may be validated somewhere as .json and not .jsonc - in which case the comments would break things.

If that were not an issue, these comments still wouldn't be visible in the same way as taskipy.
Using your names and descriptions from (#664 (comment)) you'd get this help menu for free:

>>> uv run task --list
build-metadata              Generates datapackage.json
format-metadata             Formats TOML metadata files
update-metadata             Combines both

If we're using npm scripts, I'd lean towards more info in README.md but linking to external docs wherever possible

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants