unibox provides unified interface for common file operations
pip install unibox
With uv
:
uv tool install unibox
If you're not using python 3.13, it's also recommended to install pandas[performance]
:
pip install "pandas[performance]"
to update or remove project dependencies:
uv add requests
uv remove requests
# after adding new package: rerun
make setup
import the lib:
import unibox as ub
you can load and use a huggingface dataset directly with hf://{username}/{daataset_repo}
:
hf_dset = ub.loads("hf://incantor/aesthetic_eagle_5category_iter99")
df = hf_dset.to_pandas()
and upload a processed dataframe back to huggingface:
df["new_col"] = "new changes"
ub.saves(df, "hf://datatmp/updated_repo")
current concerns:
- loads(): temp files could accumulate on global dir, and take up all of /tmp/; also concurrency issues
- s3_backend: only one that takes a dir; should make others do the same
to get a coverage report, run:
pytest --cov=src/unibox --cov-report=term-missing tests
To build the docs:
make docs host=0.0.0.0
# or in debug mode:
make check-docs
migrating from unibox 0.4
no longer supported:
ub.traverses()
: removed handlers andexclude_extensions
(include_extensions
still works but depreciated withexts
)