Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Measure transformers for Data Balance Analysis #1218

Merged
merged 43 commits into from
Oct 22, 2021

Conversation

ms-kashyap
Copy link
Contributor

As part of the Data Balance Analysis feature, I'm introducing three transformers under the exploratory namespace in synapseml-core.

At a high level:

  • The DistributionMeasures transformer computes data imbalance measures based on a reference distribution.
  • The AggregateMeasures transformer computes a set of aggregated measures that represents how balanced or imbalanced the given dataframe is along the given sensitive features.
  • The ParityMeasures transformer computes a set of parity measures from the given dataframe and sensitive features.

This work is based on the work of @ankit-oss.
Additionally, the parity measures were heavily influenced by this paper titled "Measuring Model Biases in the Absence of Ground Truth".

ms-kashyap and others added 30 commits August 11, 2021 15:29
feat: ONNX model inference on Spark (microsoft#1152)
…dress PR to jasowang comments, pSensitive -> pFeature
…p-value, address memoryz PR comments, merge from master (new namespace)
Copy link
Collaborator

@mhamilton723 mhamilton723 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome Contribution and very clean code! Left a few nits.

On the whole one thing I am thinking about is whether these should be Transformers or Evaluators. Perhaps we can discuss offline with everyone

@ms-kashyap
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Commenter does not have sufficient privileges for PR 1218 in repo microsoft/SynapseML

@ms-kashyap
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ms-kashyap ms-kashyap marked this pull request as draft October 20, 2021 03:22
@ms-kashyap ms-kashyap marked this pull request as ready for review October 21, 2021 03:07
@ms-kashyap
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@ms-kashyap
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants