Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evaluation methods to synthesizer #1190

Closed
amontanez24 opened this issue Jan 25, 2023 · 2 comments
Closed

Add evaluation methods to synthesizer #1190

amontanez24 opened this issue Jan 25, 2023 · 2 comments
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@amontanez24
Copy link
Contributor

amontanez24 commented Jan 25, 2023

Problem Description

As a user, it would be useful to evaluate the synthetic data generated against the original data.

Acceptance criteria

  • Add a evaluation module and two submodules within it: single_table and multi_table

  • Add the following methods to the evaluation.single_table module

  • Add the following methods to the evaluation.multi_table module

    • run_diagnostic(real_data ,synthetic_data, metadata, verbose) - Wrapper around the initialization and evaluation of this class.
    • evaluate_quality(real_data, synthetic_data, metadata, verbose) - Wrapper around the initialization and evaluation of this class.
    • synthesizer.get_column_plot(real_data, synthetic_data, metadata, table_name, column_name) - Wraps the same method as the single table case but requires the table name.
    • synthesizer.get_column_pair_plot(real_data, synthetic_data, metadata, table_name, column_names) - Wraps the same method as the single table case but requires the table name.

Expected behavior

# Single table cases

quality_report = evaluate_quality(
  real_data=real_data, # DataFrame
  synthetic_data=synthetic_data, # DataFrame
  metadata=my_metadata, # SingleTableMetadata
  verbose=True
)
diagnostic_report = run_diagnostic(
  real_data=real_data, # DataFrame
  synthetic_data=synthetic_data, # DataFrame
  metadata=my_metadata, # SingleTableMetadata
  verbose=True
)
fig = get_column_plot(
  real_data=real_data, # DataFrame
  synthetic_data=synthetic_data, # DataFrame
  metadata=my_metadata, # SingleTableMetadata
  column_name='age'
)
fig = get_column_plot(
  real_data=real_data, # DataFrame
  synthetic_data=synthetic_data, # DataFrame
  metadata=my_metadata, # SingleTableMetadata
  column_names=['age', 'weight']
)

# Multi-table cases
quality_report = evaluate_quality(
  real_data=real_data, # dictionary
  synthetic_data=synthetic_data, # dictionary
  metadata=my_metadata, # MultiTableMetadata
  verbose=True
)
diagnostic_report = run_diagnostic(
  real_data=real_data, # dictionary
  synthetic_data=synthetic_data, # dictionary
  metadata=my_metadata, # MultiTableMetadata
  verbose=True
)
# Plot the 1D marginal distribution
fig = get_column_plot(
  real_data=real_data, # dictionary
  synthetic_data=synthetic_data, # dictionary
  metadata=my_metadata, # MultiTableMetadta
  table_name='users',
  column_name='age'
)
# Plot the 2D bivariate distribution
fig = get_column_plot(
  real_data=real_data, # dictionary
  synthetic_data=synthetic_data, # dictionary
  metadata=my_metadata, # MultitableMetadata
  table_name='users',
  column_names=['age', 'weight']
)
@npatki
Copy link
Contributor

npatki commented Feb 9, 2023

@amontanez24 @fealho I'm testing this out. It seems like evaluate_quality is actually returning the score.

The spec is to return the actual QualityReport object from SDMetrics. Shall we re-open this?

@npatki npatki reopened this Feb 9, 2023
@fealho
Copy link
Member

fealho commented Feb 9, 2023

@npatki Oh, I thought it was supposed to be the score. Yes, this should be reopened and patched.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

No branches or pull requests

3 participants