Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional specification of instance name in CustomSQL analyzer metric. #569

Merged
merged 1 commit into from
May 24, 2024

Conversation

tylermcdaniel0
Copy link
Contributor

Issue #, if available: N/A

Description of changes:

  • Allow (but do not require) specification of metric instance name in CustomSQL analyzer.
  • This will allow users to more easily distinguish success metrics produced by multiple distinct CustomSQL statements within a VerificationSuite.

@tylermcdaniel0 tylermcdaniel0 marked this pull request as draft May 24, 2024 13:03
@tylermcdaniel0 tylermcdaniel0 force-pushed the customsql-analyzer-metric-instance branch from 9cf2c6d to 963ceba Compare May 24, 2024 13:05
@tylermcdaniel0 tylermcdaniel0 marked this pull request as ready for review May 24, 2024 13:06
@tylermcdaniel0 tylermcdaniel0 force-pushed the customsql-analyzer-metric-instance branch from 963ceba to 0b143c8 Compare May 24, 2024 14:58
Copy link
Contributor

@rdsharma26 rdsharma26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@tylermcdaniel0 tylermcdaniel0 merged commit 116f63f into master May 24, 2024
1 check passed
eycho-am pushed a commit that referenced this pull request Jun 19, 2024
eycho-am pushed a commit to eycho-am/deequ that referenced this pull request Oct 9, 2024
eycho-am pushed a commit to eycho-am/deequ that referenced this pull request Oct 9, 2024
eycho-am pushed a commit to eycho-am/deequ that referenced this pull request Oct 9, 2024
eycho-am pushed a commit to eycho-am/deequ that referenced this pull request Oct 9, 2024
mentekid pushed a commit that referenced this pull request Oct 9, 2024
* Configurable RetainCompletenessRule (#564)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Optional specification of instance name in CustomSQL analyzer metric. (#569)

Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>

* Adding Wilson Score Confidence Interval Strategy (#567)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Add ConfidenceIntervalStrategy

* Add Separate Wilson and Wald Interval Test

* Add License information, Fix formatting

* Add License information

* formatting fix

* Update documentation

* Make WaldInterval the default strategy for now

* Formatting import to per line

* Separate group import to per line import

* CustomAggregator (#572)

* Add support for EntityTypes dqdl rule

* Add support for Conditional Aggregation Analyzer

---------

Co-authored-by: Joshua Zexter <jzexter@amazon.com>

* fix typo (#574)

* Fix performance of building row-level results (#577)

* Generate row-level results with withColumns

Iteratively using withColumn (singular) causes performance
issues when iterating over a large sequence of columns.

* Add back UNIQUENESS_ID

* Replace 'withColumns' with 'select' (#582)

'withColumns' was introduced in Spark 3.3, so it won't
work for Deequ's <3.3 builds.

* Replace rdd with dataframe functions in Histogram analyzer (#586)

Co-authored-by: Shriya Vanvari <svanvari@amazon.com>

* Updated version in pom.xml to 2.0.8-spark-3.4

---------

Co-authored-by: zeotuan <48720253+zeotuan@users.noreply.github.com>
Co-authored-by: tylermcdaniel0 <144386264+tylermcdaniel0@users.noreply.github.com>
Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>
Co-authored-by: Joshua Zexter <67130377+joshuazexter@users.noreply.github.com>
Co-authored-by: Joshua Zexter <jzexter@amazon.com>
Co-authored-by: bojackli <478378663@qq.com>
Co-authored-by: Josh <5685731+marcantony@users.noreply.github.com>
Co-authored-by: Shriya Vanvari <vanvari.shriya@gmail.com>
Co-authored-by: Shriya Vanvari <svanvari@amazon.com>
mentekid pushed a commit that referenced this pull request Oct 9, 2024
* Configurable RetainCompletenessRule (#564)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Optional specification of instance name in CustomSQL analyzer metric. (#569)

Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>

* Adding Wilson Score Confidence Interval Strategy (#567)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Add ConfidenceIntervalStrategy

* Add Separate Wilson and Wald Interval Test

* Add License information, Fix formatting

* Add License information

* formatting fix

* Update documentation

* Make WaldInterval the default strategy for now

* Formatting import to per line

* Separate group import to per line import

* CustomAggregator (#572)

* Add support for EntityTypes dqdl rule

* Add support for Conditional Aggregation Analyzer

---------

Co-authored-by: Joshua Zexter <jzexter@amazon.com>

* fix typo (#574)

* Fix performance of building row-level results (#577)

* Generate row-level results with withColumns

Iteratively using withColumn (singular) causes performance
issues when iterating over a large sequence of columns.

* Add back UNIQUENESS_ID

* Replace 'withColumns' with 'select' (#582)

'withColumns' was introduced in Spark 3.3, so it won't
work for Deequ's <3.3 builds.

* Replace rdd with dataframe functions in Histogram analyzer (#586)

Co-authored-by: Shriya Vanvari <svanvari@amazon.com>

* Match Breeze version with spark 3.3 (#562)

* Updated version in pom.xml to 2.0.8-spark-3.3

---------

Co-authored-by: zeotuan <48720253+zeotuan@users.noreply.github.com>
Co-authored-by: tylermcdaniel0 <144386264+tylermcdaniel0@users.noreply.github.com>
Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>
Co-authored-by: Joshua Zexter <67130377+joshuazexter@users.noreply.github.com>
Co-authored-by: Joshua Zexter <jzexter@amazon.com>
Co-authored-by: bojackli <478378663@qq.com>
Co-authored-by: Josh <5685731+marcantony@users.noreply.github.com>
Co-authored-by: Shriya Vanvari <vanvari.shriya@gmail.com>
Co-authored-by: Shriya Vanvari <svanvari@amazon.com>
mentekid pushed a commit that referenced this pull request Oct 9, 2024
* Configurable RetainCompletenessRule (#564)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Optional specification of instance name in CustomSQL analyzer metric. (#569)

Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>

* Adding Wilson Score Confidence Interval Strategy (#567)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Add ConfidenceIntervalStrategy

* Add Separate Wilson and Wald Interval Test

* Add License information, Fix formatting

* Add License information

* formatting fix

* Update documentation

* Make WaldInterval the default strategy for now

* Formatting import to per line

* Separate group import to per line import

* CustomAggregator (#572)

* Add support for EntityTypes dqdl rule

* Add support for Conditional Aggregation Analyzer

---------

Co-authored-by: Joshua Zexter <jzexter@amazon.com>

* fix typo (#574)

* Fix performance of building row-level results (#577)

* Generate row-level results with withColumns

Iteratively using withColumn (singular) causes performance
issues when iterating over a large sequence of columns.

* Add back UNIQUENESS_ID

* Replace 'withColumns' with 'select' (#582)

'withColumns' was introduced in Spark 3.3, so it won't
work for Deequ's <3.3 builds.

* Replace rdd with dataframe functions in Histogram analyzer (#586)

Co-authored-by: Shriya Vanvari <svanvari@amazon.com>

* Updated version in pom.xml to 2.0.8-spark-3.2

---------

Co-authored-by: zeotuan <48720253+zeotuan@users.noreply.github.com>
Co-authored-by: tylermcdaniel0 <144386264+tylermcdaniel0@users.noreply.github.com>
Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>
Co-authored-by: Joshua Zexter <67130377+joshuazexter@users.noreply.github.com>
Co-authored-by: Joshua Zexter <jzexter@amazon.com>
Co-authored-by: bojackli <478378663@qq.com>
Co-authored-by: Josh <5685731+marcantony@users.noreply.github.com>
Co-authored-by: Shriya Vanvari <vanvari.shriya@gmail.com>
Co-authored-by: Shriya Vanvari <svanvari@amazon.com>
mentekid pushed a commit that referenced this pull request Oct 9, 2024
* Configurable RetainCompletenessRule (#564)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Optional specification of instance name in CustomSQL analyzer metric. (#569)

Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>

* Adding Wilson Score Confidence Interval Strategy (#567)

* Configurable RetainCompletenessRule

* Add doc string

* Add default completeness const

* Add ConfidenceIntervalStrategy

* Add Separate Wilson and Wald Interval Test

* Add License information, Fix formatting

* Add License information

* formatting fix

* Update documentation

* Make WaldInterval the default strategy for now

* Formatting import to per line

* Separate group import to per line import

* CustomAggregator (#572)

* Add support for EntityTypes dqdl rule

* Add support for Conditional Aggregation Analyzer

---------

Co-authored-by: Joshua Zexter <jzexter@amazon.com>

* fix typo (#574)

* Fix performance of building row-level results (#577)

* Generate row-level results with withColumns

Iteratively using withColumn (singular) causes performance
issues when iterating over a large sequence of columns.

* Add back UNIQUENESS_ID

* Replace 'withColumns' with 'select' (#582)

'withColumns' was introduced in Spark 3.3, so it won't
work for Deequ's <3.3 builds.

* Replace rdd with dataframe functions in Histogram analyzer (#586)

Co-authored-by: Shriya Vanvari <svanvari@amazon.com>

* pdated version in pom.xml to 2.0.8-spark-3.1

---------

Co-authored-by: zeotuan <48720253+zeotuan@users.noreply.github.com>
Co-authored-by: tylermcdaniel0 <144386264+tylermcdaniel0@users.noreply.github.com>
Co-authored-by: Tyler Mcdaniel <tymcd@amazon.com>
Co-authored-by: Joshua Zexter <67130377+joshuazexter@users.noreply.github.com>
Co-authored-by: Joshua Zexter <jzexter@amazon.com>
Co-authored-by: bojackli <478378663@qq.com>
Co-authored-by: Josh <5685731+marcantony@users.noreply.github.com>
Co-authored-by: Shriya Vanvari <vanvari.shriya@gmail.com>
Co-authored-by: Shriya Vanvari <svanvari@amazon.com>
arsenalgunnershubert777 pushed a commit to arsenalgunnershubert777/deequ that referenced this pull request Nov 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants