Introduction

filesdb is a simple utility to of files generated under different experimental conditions.

Motivation

Keeping track of files generated in computational experiments is very important, and gets complicated quickly. For simple problems, using filenames is sufficient (e.g., a-1_b-2.txt). However, sometimes code changes, or more parameters become important, and now filenames may look like a-1_b-2_c-5_version-2.txt, breaking your naming scheme and complicating any code you might have that parses file names.

filesdb solves these problems by tracking all your files in an sqlite database. Additional parameters can be added at any time. The database can quickly be searched by parameter values using a command line or python interface.

Examples

Say you are have some code that generates an output file and takes two parameters, a and b. To get a new filename and add it to the database:

Bash	Python
`filename=$(filesdb add a=${a} b=${b})`	`filename = filesdb.add({'a': a, 'b': b})`

Then ${filename} can be passed as the output parameter to your code. (filesdb does not actually create the file.) This process can be repeated for different values of a and b. You can also specify the filename yourself:

Bash	Python
`filesdb add --filename=myfile.txt a=${a} b=${b}`	`filesdb.add({'a': a, 'b': b}, filename='myfile.txt')`

Note that keys cannot end in "!".

To list all the files with a=${a}

Bash	Python
`filesdb search a=${a}`	`entries = filesdb.search({'a': a})`

Later, if you decided you also want to track files by a new parameter, (e.g, c), you can simply add a new parameter:

Bash	Python
`filename=$(filesdb add a=${a} b=${b} c=${c})`	`filename = filesdb.add({'a': a, 'b': b, 'c': c})`

Finally, let's say you discover a bug and your code, and all files generated with a=0 are invalid and you decide to delete them.

Bash	Python
`filesdb delete a=0`	`deleted_entries = filesdb.delete({'a': 0})`

This WILL delete the file from the file system as well as remove it from the database. The bash version will also print the deleted entries. To make sure that you're only deleting the files you want, delete has a dry run option, which prints (or returns, in the python case) the entries to be deleted but does not actually delete them.

Bash	Python
`filesdb delete --dry_run a=0`	`entries = filesdb.delete({'a': 0}, dryrun=True)`

Advanced Features

not equal comparisons

Both search and delete support "not equal" comparisons.

Bash	Python
`filesdb search a!=${a}`	`entries = filesdb.search({'a!': a})`

Note that this syntax prevents keys from ending in "!".

copy

The copy command copies a file and its data base entry to a new directory. For example:

Bash	Python
N/A	`filesdb.copy(filename, outdir, db='files.db', wd='.', outdb='files.db')`

Will copy the ${filename} in the current directory to ${outdir}, and copy its row entry to files.db in the current director to files.db in ${outdir}.

Note that this method does not currently have a command line interface

merge

Merge one database into another.

Bash	Python
`filesdb --wd='.' --db=files.db merge files2.db`	`filesdb.merge('files2.db', 'files.db', wd='.')`

will add all entries from files2.db to files.db (if they aren't already present)

Environments

For values that are mostly the same for every experiment (e.g., the versions of each software package used), filesdb supports a separate environment dictionary. (Internally, this is stored as a separate table, with an identifying environment hash value in the main table.)

filesdb.add({'a': a, 'b': b}, filename='myfile.txt',
            environment={'pkgname1': '0.1.1', pkgname2: '2.1.0'})

To return environment data with search results (via a JOIN):

filesdb.search({}, with_environments=True)

To search based on environment values:

filesdb.search({}, environment={'pkgname1': '0.1.1'})

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
filesdb		filesdb
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
old_style.db		old_style.db
setup.cfg		setup.cfg
setup.py		setup.py
test_filesdb.py		test_filesdb.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Introduction

Motivation

Examples

Advanced Features

not equal comparisons

copy

merge

Environments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

stilley2/filesdb

Folders and files

Latest commit

History

Repository files navigation

Introduction

Motivation

Examples

Advanced Features

not equal comparisons

copy

merge

Environments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages