adtl – another data transformation language

adtl is a data transformation language (DTL) used by some applications in Global.health, notably for the ISARIC clinical data pipeline at globaldothealth/isaric and the InsightBoard project dashboard at globaldothealth/InsightBoard

Documentation: ReadTheDocs

Installation

You can install this package using either pipx or pip. Installing via pipx offers advantages if you want to just use the adtl tool standalone from the command line, as it isolates the Python package dependencies in a virtual environment. On the other hand, pip installs packages to the global environment which is generally not recommended as it can interfere with other packages on your system.

Installation via pipx:
```
pipx install adtl
```
Installation via pip:
```
python3 -m pip install adtl
```

If you are writing code which depends on adtl (instead of using the command-line program), then it is best to add a dependency on adtl to your Python build tool of choice.

To use the development version, replace adtl with the full GitHub URL:

pip install git+https://github.com/globaldothealth/adtl

Rationale

Most existing data transformation languages are usually in a XML dialect, though there are recent variations in other file formats. In addition, many DTLs use a custom domain specific language. The primary utility of this DTL is to provide a easy to use library in Python for basic data transformations, which are specified in a JSON file. It is not meant to be a comprehensive, and adtl can be used as a step within a larger data processing pipeline.

Usage

adtl can be used from the command line or as a Python library

As a CLI:

adtl specification-file input-file

Here specification-file is the parser specification (as TOML or JSON) and input-file is the data file (not the data dictionary) that adtl will transform using the instructions in the specification.

If adtl is not in your PATH, this may give an error. Either add the location where the adtl script is installed to your PATH, or try running adtl as a module

python3 -m adtl specification-file input-file

Running adtl will create output files with the name of the parser, suffixed with table names in the current working directory.

Python library:

import adtl

parser = adtl.Parser(specification)
print(parser.tables) # list of tables created

for row in parser.parse().read_table(table):
    print(row)

alternatively to get an output file as a CSV, similarly to the CLI:

import adtl

data = adtl.parse("specification-file", "input-file")

where data is returned as a dictionary of pandas dataframes, one for each table.

Development

Install pre-commit and setup pre-commit hooks (pre-commit install) which will do linting checks before commit.

Name		Name	Last commit message	Last commit date
Latest commit History 155 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
schemas		schemas
src/adtl		src/adtl
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
codecov.yml		codecov.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

adtl – another data transformation language

Installation

Rationale

Usage

Development

About

Releases 4

Packages

Contributors 3

Languages

License

globaldothealth/adtl

Folders and files

Latest commit

History

Repository files navigation

adtl – another data transformation language

Installation

Rationale

Usage

Development

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages