Should we apply `NewRowSynthesis` by default? #188

npatki · 2023-01-20T21:29:27Z

Version: 0.8.0 (in developement)

Problem Description

Currently, we are applying the SDMetrics NewRowSynthesis by default in the benchmark_single_table script. The motivation was to capture whether new synthetic data is being created at all -- or whether the rows are being re-used as in DataIdentity.

But in practice, the NewRowSynthesis metric may not be too robust. It may error out on a large # of columns, and leading to generally longer benchmarking runs.

Expected behavior

We should consider the behavior of the default NewRowSynthesis metric that we apply:

We could disable it. That is, by default set sdmetrics=None
We could fix the underlying issues with it in the SDMetrics library. Perhaps that can achieved by subsetting or some other means.

The text was updated successfully, but these errors were encountered:

npatki added the feature request Request for a new feature label Jan 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we apply `NewRowSynthesis` by default? #188

Should we apply `NewRowSynthesis` by default? #188

npatki commented Jan 20, 2023

Should we apply NewRowSynthesis by default? #188

Should we apply NewRowSynthesis by default? #188

Comments

npatki commented Jan 20, 2023

Problem Description

Expected behavior

Should we apply `NewRowSynthesis` by default? #188

Should we apply `NewRowSynthesis` by default? #188