Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metrics for evaluation in evaluation package #295

Closed
Charles204 opened this issue Jan 9, 2021 · 3 comments
Closed

Metrics for evaluation in evaluation package #295

Charles204 opened this issue Jan 9, 2021 · 3 comments
Labels
question General question about the software

Comments

@Charles204
Copy link

Charles204 commented Jan 9, 2021

  • SDV version: 0.6.1
  • Python version: 3
  • Operating System: Google Colab

Description

Hi team,

Thank you very much for your hard-working to give us a great package like this. I am so impressed and admire you guys!!!

I have some questions about the metrics I found out in the evaluation parts as below, please help me if you have any information about that.

  1. How many metrics does this package have? And which code will help us to find out all of them?
  2. Do you have any documents about the metrics in details, like how these metrics evaluate the similarity of two datasets? If not do you know where I can find them? For some metrics I found out as examples listed as below:
  • LogisticRegression Detection
  • SVC Detection
  • GaussianMixture Log Likelihood
  • Inverted Kolmogorov-Smirnov D statistic (KSTest and KSTestExtend)
  • Continuous Kullback–Leibler Divergence

What I Did

As I plan to use your evaluation metrics for my project, but I lack of full knowledge about how they work, it would be very very helpful for me. Thank you very much.

@csala
Copy link
Contributor

csala commented Jan 14, 2021

Hello @Charles204

All the usage documentation for the existing metrics can be found in the corresponding User Guides Section of the documentation. This includes some initial description of what each metric does.

Further details about each one of them can be found in the corresponding API Reference Section. Bear in mind that you can also read obtain the corresponding class documentation if you run help(TheMetricClass) on an interactive environment, like this:

In [1]: from sdv.metrics.tabular import KSTestExtended

In [2]: help(KSTestExtended)

Regarding finding all the metrics that exist, you can use the get_subclasses method from each one of the base modality metrics (SingleTable, MultiTable and TimeSeries), like this:

In [1]: from sdv.metrics.tabular import SingleTableMetric

In [2]: SingleTableMetric.get_subclasses()
Out[2]: 
{'BNLogLikelihood': sdmetrics.single_table.bayesian_network.BNLogLikelihood,
 'LogisticDetection': sdmetrics.single_table.detection.sklearn.LogisticDetection,
 'SVCDetection': sdmetrics.single_table.detection.sklearn.SVCDetection,
 'BinaryDecisionTreeClassifier': sdmetrics.single_table.efficacy.binary.BinaryDecisionTreeClassifier,
 'BinaryAdaBoostClassifier': sdmetrics.single_table.efficacy.binary.BinaryAdaBoostClassifier,
 'BinaryLogisticRegression': sdmetrics.single_table.efficacy.binary.BinaryLogisticRegression,
 'BinaryMLPClassifier': sdmetrics.single_table.efficacy.binary.BinaryMLPClassifier,
 'MulticlassDecisionTreeClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassDecisionTreeClassifier,
 'MulticlassMLPClassifier': sdmetrics.single_table.efficacy.multiclass.MulticlassMLPClassifier,
 'LinearRegression': sdmetrics.single_table.efficacy.regression.LinearRegression,
 'MLPRegressor': sdmetrics.single_table.efficacy.regression.MLPRegressor,
 'GMLogLikelihood': sdmetrics.single_table.gaussian_mixture.GMLogLikelihood,
 'CSTest': sdmetrics.single_table.multi_single_column.CSTest,
 'KSTest': sdmetrics.single_table.multi_single_column.KSTest,
 'KSTestExtended': sdmetrics.single_table.multi_single_column.KSTestExtended,
 'ContinuousKLDivergence': sdmetrics.single_table.multi_column_pairs.ContinuousKLDivergence,
 'DiscreteKLDivergence': sdmetrics.single_table.multi_column_pairs.DiscreteKLDivergence}

Finally, if you want to have closer look at the code for each metric, as well as see the ones that are used internally to build the modality-specific metrics, you can browse the code directly in the SDMetrics repository

@csala csala added the question General question about the software label Jan 14, 2021
@Amanhelloworld
Copy link

HI Team,
Could you help me how to make the trained model consistent in multiple runs. For example If I retrain the model again there is huge difference in the generated synthetic data and when I use this synthetic data for other tasks, there is large performance gaps in the results. So, is there any way to make the model consistent across multiple run.

@csala
Copy link
Contributor

csala commented Sep 9, 2021

The original question was already responded, so this can be closed.
The question in the comment above is already responded in #299

@csala csala closed this as completed Sep 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software
Projects
None yet
Development

No branches or pull requests

3 participants