Skip to content

Commit

Permalink
Merge pull request #31 from heikomuller/compact-serializer
Browse files Browse the repository at this point in the history
Compact serializer
  • Loading branch information
heikomuller authored Apr 24, 2021
2 parents 606fa10 + e0db421 commit 92a7db6
Show file tree
Hide file tree
Showing 230 changed files with 7,843 additions and 5,756 deletions.
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ instance/

# Sphinx documentation
docs/.build/
docs/_build/

# PyBuilder
target/
Expand Down
81 changes: 81 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
# Data Frame History Store - Changelog

### 0.1.0 - 2020-05-06

* Initial version. Support for snapshot archives in main-memory and on file system.


### 0.1.1 - 2020-06-16

* Allow different types of input documents (e.g., CSV files or Json).
* External merge-sort for large CSV files.
* Add managers for maintaining sets of archives.


### 0.1.2 - 2020-06-25

* Proper handling of date/time objects by the default archive reader and writer.
* Optional arguments for Json encoder and decoder for persistent archives.
* Add encoder and decoder information to archive manager metadata.
* Simple command-line interface for persistent archive manager.


### 0.1.3 - 2020-10-05

* Add archive manager that maintains descriptors in a relational database (\#8).


### 0.1.4 - 2020-10-07

* Add index position information to column class (\#11).


### 0.1.5 - 2020-11-06

* Add `__getitem__` and `get()` method to `SnapshotListing`.


### 0.2.0 - 2020-11-10

* Include wrapper for CSV files.
* Commit CSV files directly to a HISTORE archive.


### 0.2.1 - 2020-11-11

* Fix bug when adding snapshot from file without primary key (\#19).


### 0.2.2 - 2020-11-17

* Add default Json encoder and decoder for `ArchiveFileStore`.
* Add optional operation descriptor to snapshots (\#21).


### 0.3.0 - 2021-02-08

* Add support for archive rollback.


### 0.3.1 - 2021-02-22

* Disable type inference when checking out dataset snapshot as data frame (\#24).


### 0.4.0 - 2021-04-24

* Add more compact archive serialization option.
* Add option to select archive serializer (\#27).
* Add option to commit dataset snapshot from a data stream.
* Add `histore.archive.reader.SnapshotReader` (a `histore.document.base.Document` implementation) to read dataset snapshots.
* Add close method to `histore.archive.reader.ArchiveReader` interface.
* Change behavior of `histore.document.schema.to_schema()` to take existing Column objects into account.
* Direct update of archive snapshots via `apply()` and `histore.document.operator.DatasetOperator`.
* Require archives to be created from initial snapshot if primary key is used.
* Add `histore.document.json.base.JsonDocument` to read serialized Json documents.
* Use user's cache directory as the default parent directory for archive managers.
* Remove option for partial merge.
* Rename type-hint ``Schema`` to ``DocumentSchema``.
* Add empty document class ``histore.document.mem.Schema``.
* Change format of serialized archive JSON files.
* Change internal representation of timestamps.
22 changes: 11 additions & 11 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,34 +45,34 @@ Example using Volatile Archive

Start by creating a new archive. For each archive, a optional primary key (list of column names) can be specified. If a primary key is given, the values in the key attributes are used as row keys when data set snapshots are merged into the archive. If no primary key is specified the row index of the data frame is used to match rows during the merge phase.

For archives that have a primary key, the initial dataset snapshot (or at least the dataset schema) needs to be given when creating the archive.

.. code-block:: python
# Create a new archive that merges snapshots
# based on a primary key attribute
import histore as hs
archive = hs.Archive(primary_key='Name')
Add the first two data set versions to the archive:

.. code-block:: python
import pandas as pd
# First version
# First version
df = pd.DataFrame(
data=[['Alice', 32], ['Bob', 45], ['Claire', 27], ['Dave', 23]],
columns=['Name', 'Age']
)
archive.commit(df, description='First snapshot')
archive = hs.Archive(doc=df, primary_key='Name', descriptor=hs.Descriptor('First snapshot'))
Add the first two data set versions to the archive:

.. code-block:: python
# Second version: Change age for Alice and Bob
df = pd.DataFrame(
data=[['Alice', 33], ['Bob', 44], ['Claire', 27], ['Dave', 23]],
columns=['Name', 'Age']
)
archive.commit(df, description='Alice is 33 and Bob 44')
archive.commit(df, descriptor=hs.Descriptor('Alice is 33 and Bob 44'))
List information about all snapshots in the archive. This also shows how to use the checkout method to retrieve a particular data set version:
Expand Down Expand Up @@ -114,7 +114,7 @@ To create persistent archive that maintains all data on disk use the ``Persisten

.. code-block:: python
archive = hs.PersistentArchive(basedir='path/to/archive/dir', primary_key=['Name'])
archive = hs.PersistentArchive(basedir='path/to/archive/dir', create=True, doc=df, primary_key=['Name'])
The persistent archive maintains the data set snapshots in two files that are created in the directory that is given as the ``basedir`` argument.

Expand Down
62 changes: 0 additions & 62 deletions changelog.md

This file was deleted.

29 changes: 21 additions & 8 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@
# -- Project information -----------------------------------------------------

project = 'History Store for Data Frames'
copyright = '2020, Heiko Mueller'
copyright = '2020-2021, Heiko Mueller'
author = 'Heiko Mueller'

# The full version, including alpha/beta/rc tags
release = '0.1.0'
release = '0.4.0'


# -- General configuration ---------------------------------------------------
Expand All @@ -31,8 +31,16 @@
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = [
'sphinx.ext.autodoc',
'sphinx.ext.napoleon',
'sphinxcontrib.apidoc'
]

# Configure numpy style documentation with Napoleon
napoleon_google_docstring = False
napoleon_use_param = False
napoleon_use_ivar = True

# Add any paths that contain templates here, relative to this directory.
templates_path = ['_templates']

Expand All @@ -41,15 +49,20 @@
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ['_build', 'Thumbs.db', '.DS_Store']

# -- apidoc configuration ----------------------------------------------------

# Configuration for the sphinxcontrib-apidoc extension
apidoc_module_dir = '../histore/'
apidoc_output_dir = 'source/api'
apidoc_separate_modules = True
apidoc_module_first = True
apidoc_extra_args = ['-d 3','--force']


# -- Options for HTML output -------------------------------------------------


# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = 'alabaster'

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ['_static']
html_theme = 'sphinx_rtd_theme'
23 changes: 15 additions & 8 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,24 @@
Welcome to History Store for Data Frames's documentation!
=========================================================

.. figure:: ./graphics/logo.png
:align: center
:alt: History Store


.. toctree::
:maxdepth: 2
:caption: Contents:

readme
serialize

source/readme
source/documents
source/serialize


.. _api-ref:

Indices and tables
==================
.. toctree::
:maxdepth: 1
:caption: API Reference:

* :ref:`genindex`
* :ref:`modindex`
* :ref:`search`
source/api/modules
1 change: 0 additions & 1 deletion docs/readme.rst

This file was deleted.

3 changes: 3 additions & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Sphinx
sphinx-rtd-theme
sphinxcontrib-apidoc
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.base.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.base module
===========================

.. automodule:: histore.archive.base
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.base.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.base module
===================================

.. automodule:: histore.archive.manager.base
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.db.base.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.db.base module
======================================

.. automodule:: histore.archive.manager.db.base
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.db.database.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.db.database module
==========================================

.. automodule:: histore.archive.manager.db.database
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.db.model.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.db.model module
=======================================

.. automodule:: histore.archive.manager.db.model
:members:
:undoc-members:
:show-inheritance:
17 changes: 17 additions & 0 deletions docs/source/api/histore.archive.manager.db.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
histore.archive.manager.db package
==================================

.. automodule:: histore.archive.manager.db
:members:
:undoc-members:
:show-inheritance:

Submodules
----------

.. toctree::
:maxdepth: 3

histore.archive.manager.db.base
histore.archive.manager.db.database
histore.archive.manager.db.model
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.descriptor.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.descriptor module
=========================================

.. automodule:: histore.archive.manager.descriptor
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.fs.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.fs module
=================================

.. automodule:: histore.archive.manager.fs
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.mem.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.mem module
==================================

.. automodule:: histore.archive.manager.mem
:members:
:undoc-members:
:show-inheritance:
7 changes: 7 additions & 0 deletions docs/source/api/histore.archive.manager.persist.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
histore.archive.manager.persist module
======================================

.. automodule:: histore.archive.manager.persist
:members:
:undoc-members:
:show-inheritance:
Loading

0 comments on commit 92a7db6

Please sign in to comment.