Docs

OS	CI testing on `master`

Docs

Visit docs.quiltdata.com. Or browse the docs on GitHub.

Manage data like code

Quilt provides versioned, reusable building blocks for analysis in the form of data packages. A data package may contain data of any type or size. In spirit, Quilt does for data what package managers and Docker registries do for code: provide a centralized, collaborative store of record.

Getting started tutorial

Reproducible Data Dependencies for Python

Benefits

Reproducibility - Imagine source code without versions. Ouch. Why live with un-versioned data? Versioned data makes analysis reproducible by creating unambiguous references to potentially complex data dependencies.
Collaboration and transparency - Data likes to be shared. Quilt offers a centralized data warehouse for finding and sharing data.
Auditing - the registry tracks all reads and writes so that admins know when data are accessed or changed
Less data prep - the registry abstracts away network, storage, and file format so that users can focus on what they wish to do with the data.
Deduplication - Data fragments are hashed with SHA256. Duplicate data fragments are written to disk once globally per user. As a result, large, repeated data fragments consume less disk and network bandwidth.
Faster analysis - Serialized data loads 5 to 20 times faster than files. Moreover, specialized storage formats like Apache Parquet minimize I/O bottlenecks so that tools like Presto DB and Hive run faster.

Commands

Here are the basic Quilt commands:

Service

Quilt is offered as a managed service at quiltdata.com.

Architecture

Quilt consists of three source-level components:

A data catalog
- Displays package meta-data in HTML
- Implemented with JavaScript with redux, sagas
A data registry
- Controls permissions
- Stores package fragments in blob storage
- Stores package meta-data
- De-duplicates repeated data fragments
- Implemented in Python with Flask and PostgreSQL
A data compiler
- Serializes tabular data to Apache Parquet
- Transforms and parses files
- builds packages locally
- pushes packages to the registry
- pulls packages from the registry
- Implemented in Python with pandas and PyArrow

Name		Name	Last commit message	Last commit date
Latest commit History 1,111 Commits
appveyor		appveyor
catalog		catalog
compiler		compiler
docs		docs
registry		registry
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
appveyor.yml		appveyor.yml
book.json		book.json
circle.yml		circle.yml
pylint_git_commit_hook.sh		pylint_git_commit_hook.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Docs

Manage data like code

Getting started tutorial

Benefits

Commands

Service

Architecture

About

Releases

Packages

Languages

License

eode/quilt-compiler

Folders and files

Latest commit

History

Repository files navigation

Docs

Manage data like code

Getting started tutorial

Benefits

Commands

Service

Architecture

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages