-
Notifications
You must be signed in to change notification settings - Fork 5
Description
Elevator Pitch
Adopt a top-level repo task runner which knows when to run different tasks to achieve desired goals. Use in CI, and document for local development.
Motivation
This repo's development workflow will take as inputs on a given PR:
- human-authored TOML, YAML, etc.
- templates to generate packages/documentation
- narrative documentation
- build dependencies
And generate as outputs:
- canonical JSON in
- documentation as HTML (and PDF, etc)
- checking reports (e.g. links, grammar, spelling)
- multiple language/framework-specific packages
- distributions
- test reports
- coverage reports
- documentation
Proposal
make is fine, but is still complex to operate in 2024 for windows users. Indeed, even pre-commit (or one of its many plugins) make non-portable assumptions, and "I can't even commit," isn't a very nice feature for a new/drive-by contributor.
If indeed the top level of the repo will be (at least) a canonical, no- or one-dependency python project, I'd recommend starting with doit, where the repo would contain a top-level dodo.py (or any other file, as configured in pyproject.toml).
Example
Given a layout like:
./
pyproject.toml
dodo.py
schema/
some/
path/
thing.schema.yaml
And a preflight such as:
python -m pip install -e .[dev]And the dodo.py:
from pathlib import Path
import tomli_w
import json
from typing import Type
import yaml
import jsonschema
ROOT = Path(__file__).parent
SCHEMA = ROOT / "schema"
ALL_SCHEMA_SRC = [*SCHEMA.rglob("*.schema.toml"), *SCHEMA.rglob("*.schema.yaml")]
ALL_SCHEMA_DIST = {
src: src.parent / f"""{src.stem}.json""" for src in ALL_SCHEMA_SRC
}
def task_build():
for src, schema in ALL_SCHEMA_DIST.items():
rel = schema.relative_to(SCHEMA)
yield dict(
name=f"schema:{rel}",
actions=[(_convert_one, [src, schema])],
file_dep=[src],
targets=[schema]
)
def task_validate():
for schema in ALL_SCHEMA_DIST.values():
rel = schema.relative_to(SCHEMA)
yield dict(
name=f"schema:{rel}",
actions=[(_validate_one, [schema])]
)
def _convert_one(src: Path, dest: Path) -> bool:
data = None
if src.suffix == "toml":
data = tomli_w.load(src.open())
elif src.suffix == "yaml":
data = yaml.safe_load(src.open())
else:
return False
text = json.dumps(data, indent=2, sort_keys=True)
dest.write_text(text, encoding="utf-8")
def _validate_one(schema_path: Path, instance_path: Path|None=None) -> bool:
schema = json.loads(schema_path.read_text(encoding="utf-8"))
validator_cls: Type[jsonschema.Validator] = jsonschema.validators.validator_for(schema)
validator_cls.check_schema(schema)
if instance_path:
validator = validator_cls(schema, format_checker=validator_cls.FORMAT_CHECKER)
instance = json.loads(instance_path.read_text(encoding="utf-8"))
validator.validate(instance)
return TrueRunning doit validate would:
- ensure all of the
.schema.jsoncome into existence, as eachvalidatetask depends on the output of abuildtask - ensure all of the schema are actually valid schema
Provided the above is true, running doit validate again wouldn't do anything.
This approach would be extended to:
formatwith e.g.prettier,taplo,rufflintas above, but alsoyamllint, etc.distinitially justpyproject-build ., but eventually many moredocswith sphinx is fine, but the existing schema are... lacking- jsonschema2md is a bit better
- but maybe jinja2 templates are the way to go
- and eventually some interactive jupyterlite site seems relevant
checkwithpytest-check-links