Skip to content

Repo-level automation #3

Open
Open
@bollwyvl

Description

@bollwyvl

Elevator Pitch

Adopt a top-level repo task runner which knows when to run different tasks to achieve desired goals. Use in CI, and document for local development.

Motivation

This repo's development workflow will take as inputs on a given PR:

  • human-authored TOML, YAML, etc.
  • templates to generate packages/documentation
  • narrative documentation
  • build dependencies

And generate as outputs:

  • canonical JSON in
  • documentation as HTML (and PDF, etc)
    • checking reports (e.g. links, grammar, spelling)
  • multiple language/framework-specific packages
    • distributions
    • test reports
    • coverage reports
    • documentation

Proposal

make is fine, but is still complex to operate in 2024 for windows users. Indeed, even pre-commit (or one of its many plugins) make non-portable assumptions, and "I can't even commit," isn't a very nice feature for a new/drive-by contributor.

If indeed the top level of the repo will be (at least) a canonical, no- or one-dependency python project, I'd recommend starting with doit, where the repo would contain a top-level dodo.py (or any other file, as configured in pyproject.toml).

Example

Given a layout like:

./
  pyproject.toml
  dodo.py
  schema/
    some/
      path/
        thing.schema.yaml

And a preflight such as:

python -m pip install -e .[dev]

And the dodo.py:

from pathlib import Path
import tomli_w
import json
from typing import Type
import yaml
import jsonschema

ROOT = Path(__file__).parent
SCHEMA = ROOT / "schema"
ALL_SCHEMA_SRC = [*SCHEMA.rglob("*.schema.toml"), *SCHEMA.rglob("*.schema.yaml")]
ALL_SCHEMA_DIST = {
    src: src.parent / f"""{src.stem}.json""" for src in ALL_SCHEMA_SRC
}

def task_build():
    for src, schema in ALL_SCHEMA_DIST.items():
        rel = schema.relative_to(SCHEMA)
        yield dict(
            name=f"schema:{rel}",
            actions=[(_convert_one, [src, schema])],
            file_dep=[src],
            targets=[schema]
        )

def task_validate():
    for schema in ALL_SCHEMA_DIST.values():
        rel = schema.relative_to(SCHEMA)
        yield dict(
            name=f"schema:{rel}",
            actions=[(_validate_one, [schema])]
        )

def _convert_one(src: Path, dest: Path) -> bool:
    data = None
    if src.suffix == "toml":
        data = tomli_w.load(src.open())
    elif src.suffix == "yaml":
        data = yaml.safe_load(src.open())
    else:
        return False
    text = json.dumps(data, indent=2, sort_keys=True)
    dest.write_text(text, encoding="utf-8")

def _validate_one(schema_path: Path, instance_path: Path|None=None) -> bool:
    schema = json.loads(schema_path.read_text(encoding="utf-8"))
    validator_cls: Type[jsonschema.Validator] = jsonschema.validators.validator_for(schema)
    validator_cls.check_schema(schema)

    if instance_path:
        validator = validator_cls(schema, format_checker=validator_cls.FORMAT_CHECKER)
        instance = json.loads(instance_path.read_text(encoding="utf-8"))
        validator.validate(instance)
    
    return True

Running doit validate would:

  • ensure all of the .schema.json come into existence, as each validate task depends on the output of a build task
  • ensure all of the schema are actually valid schema

Provided the above is true, running doit validate again wouldn't do anything.

This approach would be extended to:

  • format with e.g. prettier, taplo, ruff
  • lint as above, but also yamllint, etc.
  • dist initially just pyproject-build ., but eventually many more
  • docs with sphinx is fine, but the existing schema are... lacking
    • jsonschema2md is a bit better
    • but maybe jinja2 templates are the way to go
    • and eventually some interactive jupyterlite site seems relevant
  • check with pytest-check-links

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions