Skip to content

Commit

Permalink
make release-tag: Merge branch 'master' into stable
Browse files Browse the repository at this point in the history
  • Loading branch information
katxiao committed Dec 22, 2021
2 parents 1ff4fad + 56b9cd0 commit 1fc8561
Show file tree
Hide file tree
Showing 30 changed files with 432 additions and 46 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/integration.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-10.15, windows-latest]
steps:
- uses: actions/checkout@v1
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/minimum.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-10.15, windows-latest]
steps:
- uses: actions/checkout@v1
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/readme.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-10.15] # skip windows bc rundoc fails
steps:
- uses: actions/checkout@v1
Expand Down
4 changes: 2 additions & 2 deletions .github/workflows/tutorials.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
os: [ubuntu-latest, macos-latest, windows-latest]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-10.15, windows-latest]
steps:
- uses: actions/checkout@v1
- name: Set up Python ${{ matrix.python-version }}
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/unit.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ jobs:
runs-on: ${{ matrix.os }}
strategy:
matrix:
python-version: [3.6, 3.7, 3.8]
python-version: [3.6, 3.7, 3.8, 3.9]
os: [ubuntu-latest, macos-10.15, windows-latest]
steps:
- uses: actions/checkout@v1
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,6 @@ ENV/
sdv/data/
docs/**/*.pkl
docs/**/*metadata.json
docs/images
docs/savefig
tutorials/**/*.pkl
tutorials/**/*metadata.json
Expand Down
20 changes: 20 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# Release Notes

## 0.13.1 - 2021-12-22

This release adds support for passing tabular constraints to the HMA1 model, and adds more explicit error handling for
metric evaluation. It also includes a fix for using categorical columns in the PAR model and documentation updates
for metadata and HMA1.

### Bugs Fixed

* Categorical column after sequence_index column - Issue [#314](https://github.com/sdv-dev/SDV/issues/314) by @fealho

### New Features

* Support passing tabular constraints to the HMA1 model - Issue [#296](https://github.com/sdv-dev/SDV/issues/296) by @katxiao
* Metric evaluation error handling metrics - Issue [#638](https://github.com/sdv-dev/SDV/issues/638) by @katxiao

### Documentation Changes

* Make true/false values lowercase in Metadata Schema specification - Issue [#664](https://github.com/sdv-dev/SDV/issues/664) by @katxiao
* Update docstrings for hma1 methods - Issue [#642](https://github.com/sdv-dev/SDV/issues/642) by @katxiao

## 0.13.0 - 2021-11-22

This release makes multiple improvements to different `Constraint` classes. The `Unique` constraint can now
Expand Down
73 changes: 57 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,7 @@
<p align="left">
<a href="https://dai.lids.mit.edu">
<img width=15% src="https://dai.lids.mit.edu/wp-content/uploads/2018/06/Logo_DAI_highres.png" alt="DAI-Lab" />
</a>
<i>An Open Source Project from the <a href="https://dai.lids.mit.edu">Data to AI Lab, at MIT</a></i>
<div align="center">
<br/>
<p align="center">
<i>This repository is part of <a href="https://sdv.dev">The Synthetic Data Vault Project</a>, a project from <a href="https://datacebo.com">DataCebo</a>.</i>
</p>

[![Development Status](https://img.shields.io/badge/Development%20Status-2%20--%20Pre--Alpha-yellow)](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
Expand All @@ -13,17 +12,16 @@
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/sdv-dev/SDV/master?filepath=tutorials)
[![Slack](https://img.shields.io/badge/Slack%20Workspace-Join%20now!-36C5F0?logo=slack)](https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw)

<img width=30% src="docs/images/SDV-Logo-Color-Tagline.png">
<div align="left">
<br/>
<p align="center">
<img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/SDV-DataCebo.png"></img>
</p>
</div>

* Website: https://sdv.dev
* Documentation: https://sdv.dev/SDV
* [User Guides](https://sdv.dev/SDV/user_guides/index.html)
* [Developer Guides](https://sdv.dev/SDV/developer_guides/index.html)
* Github: https://github.com/sdv-dev/SDV
* License: [MIT](https://github.com/sdv-dev/SDV/blob/master/LICENSE)
* Development Status: [Pre-Alpha](https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha)
</div>

## Overview
# Overview

The **Synthetic Data Vault (SDV)** is a **Synthetic Data Generation** ecosystem of libraries
that allows users to easily learn [single-table](
Expand All @@ -41,7 +39,27 @@ Underneath the hood it uses several probabilistic graphical modeling and deep le
techniques. To enable a variety of data storage structures, we employ unique
hierarchical generative modeling and recursive sampling techniques.

### Current functionality and features:
| Important Links | |
| -------------------------- | -------------------------------------------------------------- |
| :computer: **[Website]** | Check out the SDV Website for more information about the project. |
| :orange_book: **[SDV Blog]** | Regular publshing of useful content about Synthetic Data Generation. |
| :book: **[Documentation]** | Quickstarts, User and Development Guides, and API Reference. |
| :octocat: **[Repository]** | The link to the Github Repository of this library. |
| :scroll: **[License]** | The entire ecosystem is published under the MIT License. |
| :keyboard: **[Development Status]** | This software is in its Pre-Alpha stage. |
| ![](slack.png) **[Community]** | Join our Slack Workspace for announcements and discussions. |
| ![](mybinder.png) **[Tutorials]** | Run the SDV Tutorials in a Binder environment. |

[Website]: https://sdv.dev
[SDV Blog]: https://sdv.dev/blog
[Documentation]: https://sdv.dev/SDV
[Repository]: https://github.com/sdv-dev/SDV
[License]: https://github.com/sdv-dev/SDV/blob/master/LICENSE
[Development Status]: https://pypi.org/search/?c=Development+Status+%3A%3A+2+-+Pre-Alpha
[Community]: https://join.slack.com/t/sdv-space/shared_invite/zt-gdsfcb5w-0QQpFMVoyB2Yd6SRiMplcw
[Tutorials]: https://mybinder.org/v2/gh/sdv-dev/SDV/master?filepath=tutorials

## Current functionality and features:

* Synthetic data generators for [single tables](
https://sdv.dev/SDV/user_guides/single_table/index.html) with the following
Expand Down Expand Up @@ -89,7 +107,7 @@ pip install sdv
**Using `conda`:**

```bash
conda install -c sdv-dev -c pytorch -c conda-forge sdv
conda install -c pytorch -c conda-forge sdv
```

For more installation options please visit the [SDV installation Guide](
Expand Down Expand Up @@ -254,3 +272,26 @@ Neha Patki, Roy Wedge, Kalyan Veeramachaneni. [The Synthetic Data Vault](https:/
month={Oct}
}
```

---


<div align="center">
<a href="https://datacebo.com"><img align="center" width=40% src="https://github.com/sdv-dev/SDV/blob/master/docs/images/DataCebo.png"></img></a>
</div>
<br/>
<br/>

The [DataCebo team](https://datacebo.com) is the proud developer of [The Synthetic Data Vault Project](
https://sdv.dev), the largest open source ecosystem for synthetic data generation & evaluation.
The ecosystem is home to multiple libraries that support synthetic data, including:

* 🔄 Data discovery & transformation. Reverse the transforms to reproduce realistic data.
* 🧠 Multiple machine learning models -- ranging from Copulas to Deep Learning -- to create tabular,
multi table and time series data.
* 📊 Measuring quality and privacy of synthetic data, and comparing different synthetic data
generation models.

[Get started using the SDV package](https://sdv.dev/SDV/getting_started/install.html) -- a fully
integrated solution and your one-stop shop for synthetic data.Or, use the standalone libraries
for specific needs.
6 changes: 3 additions & 3 deletions conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
{% set name = 'sdv' %}
{% set version = '0.13.0' %}
{% set version = '0.13.1.dev1' %}

package:
name: "{{ name|lower }}"
Expand Down Expand Up @@ -28,7 +28,7 @@ requirements:
- ctgan >=0.5.0,<0.6
- deepecho >=0.3.0.post1,<0.4
- rdt >=0.6.1,<0.7
- sdmetrics >=0.4.0,<0.5
- sdmetrics >=0.4.1,<0.5
run:
- graphviz
- python >=3.6,<3.10
Expand All @@ -41,7 +41,7 @@ requirements:
- ctgan >=0.5.0,<0.6
- deepecho >=0.3.0.post1,<0.4
- rdt >=0.6.1,<0.7
- sdmetrics >=0.4.0,<0.5
- sdmetrics >=0.4.1,<0.5

about:
home: "https://sdv.dev"
Expand Down
6 changes: 3 additions & 3 deletions docs/developer_guides/sdv/metadata.rst
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ the following keys.
"fields": {
"social_security_number": {
"type": "categorical",
"pii": True,
"pii": true,
"pii_category": "ssn"
},
...
Expand Down Expand Up @@ -180,7 +180,7 @@ A list of all possible localizations can be found on the `Faker documentation si
"fields": {
"address": {
"type": "categorical",
"pii": True,
"pii": true,
"pii_category": "address"
"pii_locales": ["sv_SE", "en_US"]
},
Expand Down Expand Up @@ -215,7 +215,7 @@ If a field is specified as a ``primary_key`` of the table, then the field must b
...
}
If the subtype of the primary key is integer, an optional regular expression can be passed to
If the subtype of the primary key is string, an optional regular expression can be passed to
generate keys that match it:

.. code-block:: python
Expand Down
Binary file added docs/images/CTGAN-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/Copulas-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/DataCebo-Blue.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/DeepEcho-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/RDT-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SDGym-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SDMetrics-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/SDV-DataCebo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
82 changes: 82 additions & 0 deletions docs/user_guides/relational/constraints.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,82 @@
.. _relational_constraints:

Constraints
===========

SDV supports adding constraints within a single table. See :ref:`single_table_constraints`
for more information about the available single table constraints.

In order to use single-table constraints within a relational model, you can pass
in a list of applicable constraints when adding a table to your relational ``Metadata``.
(See :ref:`relational_metadata` for more information on constructing a ``Metadata`` object.)

In this example, we wish to add a ``UniqueCombinations`` constraint to our ``sessions`` table,
which is a child table of ``users``. First, we will create a ``Metadata`` object and add the
``users`` table.

.. ipython:: python
:okwarning:
from sdv import load_demo, Metadata
tables = load_demo()
metadata = Metadata()
metadata.add_table(
name='users',
data=tables['users'],
primary_key='user_id'
)
The metadata now contains the ``users`` table.

.. ipython:: python
:okwarning:
metadata
Now, we want to add a child table ``sessions`` which contains a single table constraint.
In the ``sessions`` table, we wish to only have combinations of ``(device, os)`` that
appear in the original data.

.. ipython:: python
:okwarning:
from sdv.constraints import UniqueCombinations
constraint = UniqueCombinations(columns=['device', 'os'])
metadata.add_table(
name='sessions',
data=tables['sessions'],
primary_key='session_id',
parent='users',
foreign_key='user_id',
constraints=[constraint],
)
If we get the table metadata for ``sessions``, we can see that the constraint has been added.

.. ipython:: python
:okwarning:
metadata.get_table_meta('sessions')
We can now use this metadata to fit a relational model and synthesize data.

.. ipython:: python
:okwarning:
from sdv.relational import HMA1
model = HMA1(metadata)
model.fit(tables)
new_data = model.sample()
In the sampled data, we should see that our constraint is being satisfied.

.. ipython:: python
:okwarning:
new_data
1 change: 1 addition & 0 deletions docs/user_guides/relational/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,4 @@ Relational Data

data_description
models
constraints
2 changes: 1 addition & 1 deletion docs/user_guides/single_table/custom_constraints.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Let's look at a demo dataset:
employees = load_tabular_demo()
employees
The dataset defined in :ref:`_single_table_constraints` contains basic details about employees.
The dataset defined in :ref:`handling_constraints` contains basic details about employees.
We will use this dataset to demonstrate how you can create your own constraint.


Expand Down
2 changes: 1 addition & 1 deletion sdv/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

__author__ = """MIT Data To AI Lab"""
__email__ = 'dailabmit@gmail.com'
__version__ = '0.13.0'
__version__ = '0.13.1.dev1'

from sdv import constraints, evaluation, metadata, relational, tabular
from sdv.demo import get_available_demos, load_demo
Expand Down
1 change: 0 additions & 1 deletion sdv/evaluation.py
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,6 @@ def evaluate(synthetic_data, real_data=None, metadata=None, root_path=None,
synthetic_data = synthetic_data[table]

scores = sdmetrics.compute_metrics(metrics, real_data, synthetic_data, metadata=metadata)
scores.dropna(inplace=True)

if aggregate:
return scores.normalized_score.mean()
Expand Down
18 changes: 16 additions & 2 deletions sdv/metadata/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
import pandas as pd
from rdt import HyperTransformer, transformers

from sdv.constraints import Constraint
from sdv.metadata import visualization
from sdv.metadata.errors import MetadataError

Expand Down Expand Up @@ -871,7 +872,7 @@ def _get_field_details(self, data, fields):
return fields_metadata

def add_table(self, name, data=None, fields=None, fields_metadata=None,
primary_key=None, parent=None, foreign_key=None):
primary_key=None, parent=None, foreign_key=None, constraints=None):
"""Add a new table to this metadata.
``fields`` list can be a mixture of field names, which will be build automatically
Expand Down Expand Up @@ -902,7 +903,10 @@ def add_table(self, name, data=None, fields=None, fields_metadata=None,
parent (str):
Table name to refere a foreign key field. Defaults to ``None``.
foreign_key (str):
Foreing key field name to ``parent`` table primary key. Defaults to ``None``.
Foreign key field name to ``parent`` table primary key. Defaults to ``None``.
constraints (list[Constraint, dict]):
List of Constraint objects or dicts representing the constraints for the
given table.
Raises:
ValueError:
Expand Down Expand Up @@ -938,6 +942,16 @@ def add_table(self, name, data=None, fields=None, fields_metadata=None,

self._metadata['tables'][name] = table_metadata

if constraints:
meta_constraints = []
for constraint in constraints:
if isinstance(constraint, Constraint):
meta_constraints.append(constraint.to_dict())
else:
meta_constraints.append(constraint)

table_metadata['constraints'] = meta_constraints

try:
if primary_key:
self.set_primary_key(name, primary_key)
Expand Down
Loading

0 comments on commit 1fc8561

Please sign in to comment.