Skip to content

Latest commit

 

History

History
94 lines (58 loc) · 1.9 KB

README.md

File metadata and controls

94 lines (58 loc) · 1.9 KB

unibox

ci documentation pypi version gitter

unibox provides unified interface for common file operations

Installation

pip install unibox

With uv:

uv tool install unibox

If you're not using python 3.13, it's also recommended to install pandas[performance]:

pip install "pandas[performance]"

to update or remove project dependencies:

uv add requests

uv remove requests

# after adding new package: rerun
make setup

Usage

import the lib:

import unibox as ub

Using Huggingface Backend

you can load and use a huggingface dataset directly with hf://{username}/{daataset_repo}:

hf_dset = ub.loads("hf://incantor/aesthetic_eagle_5category_iter99")
df = hf_dset.to_pandas()

and upload a processed dataframe back to huggingface:

df["new_col"] = "new changes"
ub.saves(df, "hf://datatmp/updated_repo")

Dev notes

current concerns:

  1. loads(): temp files could accumulate on global dir, and take up all of /tmp/; also concurrency issues
  2. s3_backend: only one that takes a dir; should make others do the same

to get a coverage report, run:

pytest --cov=src/unibox --cov-report=term-missing tests

To build the docs:

make docs host=0.0.0.0

# or in debug mode:
make check-docs

migrating from unibox 0.4

no longer supported:

  • ub.traverses(): removed handlers and exclude_extensions (include_extensions still works but depreciated with exts)