Skip to content

Latest commit

 

History

History
87 lines (62 loc) · 2.5 KB

README.rst

File metadata and controls

87 lines (62 loc) · 2.5 KB
https://travis-ci.org/jbn/pathsjson.svg?branch=master https://ci.appveyor.com/api/projects/status/xre5b722qk6ckqaf?svg=true https://coveralls.io/repos/github/jbn/pathsjson/badge.svg?branch=master

What is this?

A JSON-based DSL for describing paths in your project.

Why is this?

My etl/data analysis projects are littered with code like,

import os

DATA_DIR = "data"
CLEAN_DIR = os.path.join(DATA_DIR, "clean")
RAW_DIR = os.path.join(DATA_DIR, "raw")
TARGET_HTML = os.path.join(RAW_DIR, "something.html")
OUTPUT_FILE = os.path.join(CLEAN_DIR, "something.csv")

with open(TARGET_HTML) as fp:
    csv = process(fp)

with open(OUTPUT_FILE) as fp:
    write_csv(fp)

It's fine for one file, but when you have a whole ETL pipeline tucked into a Makefile, the duplication leads to fragility and violates DRY. It's a REALLY common pattern in file-based processing. This package and format lets you do create a .paths.json file like,

{
    "__ENV": {"VERSION": "0.0.1"},
    "DATA_DIR": ["data", "$$VERSION"],
    "CLEAN_DIR": ["$DATA_DIR", "clean"],
    "RAW_DIR": ["$DATA_DIR", "raw"],
    "SOMETHING_HTML": ["$RAW_DIR", "something.html"],
    "SOMETHING_CSV": ["$RAW_DIR", "something.csv"]
}

Then, from your python scripts,

from pathsjson.automagic import PATHS

print("Processing:", PATHS['SOMETHING_HTML'])
with PATHS.resolve('SOMETHING_HTML').open() as fp:
    csv = process(fp)

with PATHS.resolve('SOMETHING_CSV').open("w") as fp:
    write_csv(fp)

Installation

pip install pathsjson

Validation

There is a .paths.json schema. It's validated with JSON-Schema.

More details

Read the docs: here.