Description
Elevator Pitch
Adopt a top-level repo task runner which knows when to run different tasks to achieve desired goals. Use in CI, and document for local development.
Motivation
This repo's development workflow will take as inputs on a given PR:
- human-authored TOML, YAML, etc.
- templates to generate packages/documentation
- narrative documentation
- build dependencies
And generate as outputs:
- canonical JSON in
- documentation as HTML (and PDF, etc)
- checking reports (e.g. links, grammar, spelling)
- multiple language/framework-specific packages
- distributions
- test reports
- coverage reports
- documentation
Proposal
make
is fine, but is still complex to operate in 2024 for windows users. Indeed, even pre-commit
(or one of its many plugins) make non-portable assumptions, and "I can't even commit," isn't a very nice feature for a new/drive-by contributor.
If indeed the top level of the repo will be (at least) a canonical, no- or one-dependency python project, I'd recommend starting with doit, where the repo would contain a top-level dodo.py
(or any other file, as configured in pyproject.toml
).
Example
Given a layout like:
./
pyproject.toml
dodo.py
schema/
some/
path/
thing.schema.yaml
And a preflight such as:
python -m pip install -e .[dev]
And the dodo.py
:
from pathlib import Path
import tomli_w
import json
from typing import Type
import yaml
import jsonschema
ROOT = Path(__file__).parent
SCHEMA = ROOT / "schema"
ALL_SCHEMA_SRC = [*SCHEMA.rglob("*.schema.toml"), *SCHEMA.rglob("*.schema.yaml")]
ALL_SCHEMA_DIST = {
src: src.parent / f"""{src.stem}.json""" for src in ALL_SCHEMA_SRC
}
def task_build():
for src, schema in ALL_SCHEMA_DIST.items():
rel = schema.relative_to(SCHEMA)
yield dict(
name=f"schema:{rel}",
actions=[(_convert_one, [src, schema])],
file_dep=[src],
targets=[schema]
)
def task_validate():
for schema in ALL_SCHEMA_DIST.values():
rel = schema.relative_to(SCHEMA)
yield dict(
name=f"schema:{rel}",
actions=[(_validate_one, [schema])]
)
def _convert_one(src: Path, dest: Path) -> bool:
data = None
if src.suffix == "toml":
data = tomli_w.load(src.open())
elif src.suffix == "yaml":
data = yaml.safe_load(src.open())
else:
return False
text = json.dumps(data, indent=2, sort_keys=True)
dest.write_text(text, encoding="utf-8")
def _validate_one(schema_path: Path, instance_path: Path|None=None) -> bool:
schema = json.loads(schema_path.read_text(encoding="utf-8"))
validator_cls: Type[jsonschema.Validator] = jsonschema.validators.validator_for(schema)
validator_cls.check_schema(schema)
if instance_path:
validator = validator_cls(schema, format_checker=validator_cls.FORMAT_CHECKER)
instance = json.loads(instance_path.read_text(encoding="utf-8"))
validator.validate(instance)
return True
Running doit validate
would:
- ensure all of the
.schema.json
come into existence, as eachvalidate
task depends on the output of abuild
task - ensure all of the schema are actually valid schema
Provided the above is true, running doit validate
again wouldn't do anything.
This approach would be extended to:
format
with e.g.prettier
,taplo
,ruff
lint
as above, but alsoyamllint
, etc.dist
initially justpyproject-build .
, but eventually many moredocs
with sphinx is fine, but the existing schema are... lacking- jsonschema2md is a bit better
- but maybe jinja2 templates are the way to go
- and eventually some interactive jupyterlite site seems relevant
check
withpytest-check-links