Metrics for evaluation in evaluation package #295

Charles204 · 2021-01-09T02:06:23Z

SDV version: 0.6.1
Python version: 3
Operating System: Google Colab

Description

Hi team,

Thank you very much for your hard-working to give us a great package like this. I am so impressed and admire you guys!!!

I have some questions about the metrics I found out in the evaluation parts as below, please help me if you have any information about that.

How many metrics does this package have? And which code will help us to find out all of them?
Do you have any documents about the metrics in details, like how these metrics evaluate the similarity of two datasets? If not do you know where I can find them? For some metrics I found out as examples listed as below:

LogisticRegression Detection
SVC Detection
GaussianMixture Log Likelihood
Inverted Kolmogorov-Smirnov D statistic (KSTest and KSTestExtend)
Continuous Kullback–Leibler Divergence

What I Did

As I plan to use your evaluation metrics for my project, but I lack of full knowledge about how they work, it would be very very helpful for me. Thank you very much.

csala · 2021-01-14T17:49:47Z

Hello @Charles204

All the usage documentation for the existing metrics can be found in the corresponding User Guides Section of the documentation. This includes some initial description of what each metric does.

Further details about each one of them can be found in the corresponding API Reference Section. Bear in mind that you can also read obtain the corresponding class documentation if you run help(TheMetricClass) on an interactive environment, like this:

In [1]: from sdv.metrics.tabular import KSTestExtended

In [2]: help(KSTestExtended)

Regarding finding all the metrics that exist, you can use the get_subclasses method from each one of the base modality metrics (SingleTable, MultiTable and TimeSeries), like this:

In [1]: from sdv.metrics.tabular import SingleTableMetric

In [2]: SingleTableMetric.get_subclasses()
Out[2]: 
{'BNLogLikelihood': sdmetrics.single_table.bayesian_network.BNLogLikelihood,
 'LogisticDetection': sdmetrics.single_table.detection.sklearn.LogisticDetection,
 'SVCDetection': sdmetrics.single_table.detection.sklearn.SVCDetection,
 'BinaryDecisionTreeClassifier': sdmetrics.single_table.efficacy.binary.BinaryDecisionTreeClassifier,
 'BinaryAdaBoostClassifier': sdmetrics.single_table.efficacy.binary.BinaryAdaBoostClassifier,
 'BinaryLogisticRegression': sdmetrics.single_table.efficacy.binary.BinaryLogisticRegression,
 'BinaryMLPClassifier': sdmetrics.single_table.efficacy.binary.BinaryMLPClassifier,
 'MulticlassDecisionTreeClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassDecisionTreeClassifier,
 'MulticlassMLPClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassMLPClassifier,
 'LinearRegression': sdmetrics.single_table.efficacy.regression.LinearRegression,
 'MLPRegressor': sdmetrics.single_table.efficacy.regression.MLPRegressor,
 'GMLogLikelihood': sdmetrics.single_table.gaussian_mixture.GMLogLikelihood,
 'CSTest': sdmetrics.single_table.multi_single_column.CSTest,
 'KSTest': sdmetrics.single_table.multi_single_column.KSTest,
 'KSTestExtended': sdmetrics.single_table.multi_single_column.KSTestExtended,
 'ContinuousKLDivergence': sdmetrics.single_table.multi_column_pairs.ContinuousKLDivergence,
 'DiscreteKLDivergence': sdmetrics.single_table.multi_column_pairs.DiscreteKLDivergence}

Finally, if you want to have closer look at the code for each metric, as well as see the ones that are used internally to build the modality-specific metrics, you can browse the code directly in the SDMetrics repository

Amanhelloworld · 2021-01-21T15:53:32Z

HI Team,
Could you help me how to make the trained model consistent in multiple runs. For example If I retrain the model again there is huge difference in the generated synthetic data and when I use this synthetic data for other tasks, there is large performance gaps in the results. So, is there any way to make the model consistent across multiple run.

csala · 2021-09-09T11:42:18Z

The original question was already responded, so this can be closed.
The question in the comment above is already responded in #299

csala added the question General question about the software label Jan 14, 2021

csala closed this as completed Sep 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metrics for evaluation in evaluation package #295

Metrics for evaluation in evaluation package #295

Charles204 commented Jan 9, 2021 •

edited

Loading

csala commented Jan 14, 2021

Amanhelloworld commented Jan 21, 2021

csala commented Sep 9, 2021

Metrics for evaluation in evaluation package #295

Metrics for evaluation in evaluation package #295

Comments

Charles204 commented Jan 9, 2021 • edited Loading

Description

What I Did

csala commented Jan 14, 2021

Amanhelloworld commented Jan 21, 2021

csala commented Sep 9, 2021

Charles204 commented Jan 9, 2021 •

edited

Loading