-
Notifications
You must be signed in to change notification settings - Fork 539
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Features and enhancements done #299
base: master
Are you sure you want to change the base?
Conversation
Features: > Support Spark's DateType and TimestampType > min and max analysis on date time fields
> New constraints for Spark's DateType and Timestamp Types
> BigDecimal State and Metric Support in Min, Max, Sum and Mean Analyzers
> 'isContained' is made available for all Numeric Types and Char
> Analyzer - DateTimeDistribution for analyzing distributions metrics (count, ratio) over some time intervals
Thanks! Could you look at the CI failures and fix them? |
Hi, is this feature now available? |
sorry for the late reply. Ill work on it. |
This is valuable feature you've added @Yash0215. awesome. Looking forward to using it. |
Can I assist in anyway @yash021? I'm looking to utilise this functionality asap. |
please recommend if any changes are needed. |
So the two enhancements implemented in this PR are essentially orthogonal, right ? If so I'd recommend we split them into separate PRs and focus on each. |
Yes, that would make a lot of sense. |
Hello @Yash0215 @sscdotopen, were there furthur developments on this work? If not, I can volunteer to take it forward. |
FWIW, I forked this project and am actively developing it afterwards. I'm also thinking of announcing it as an active fork of this project somewhere (maybe as an issue for this repo) in the near future. |
Hi, thanks so much for introducing all these changes. Unfortunately, we currently don't have availability to give this a proper review. Will keep this PR in the backlog for now. If you have the opportunity to submit a couple of smaller reviews that would be great. It's hard to find the time to do big reviews and a few smaller PRs could help us understand the main ideas and make progress on this. |
@Yash0215 Please get back to us on this if you get the chance. We are considering closing this PR soon. |
Any update on this? to improve deequ for handling of timestamp/date support? |
This PR is quite big with multiple unrelated changes making review hard. Chunk this into multiple smaller PR should be a good start. I am interested in timestamp/date support |
New Features:
1.
A Date Time Distribution analyzer for analyzing the distribution of the records based on 'DateType' or 'TimestampType' feature within fixed time intervals.
files changed/created:
DateTimeDistribution.scala
DateTimeAggregation.scala
DeequFunctions.scala
...
6 new Constraints added covering more use cases for DateTime quality checks:
files changed/created:
Check.scala
Constraint.scala
Constraint 'isContainedIn' is now supports more Scala Numeric Types
files changed/created:
Check.scala
Enahancements:
Issue: Timestamp support Timestamp support #47
New State and Metric is implemented for this enhancement since previous analyzer only support Double Metric and Standard analyzer. a new abstract analyzer for timestamp analysis is implemented.
files changed/created:
MinimumDateTime.scala, MaximumDateTime.scala (for new analyzer implementation)
Analyzer.scala
Analyzer for Precision and Scale of BigDecimals Analyzer for Precision and Scale of BigDecimals #46
New State and Metric is implemented. Also new analyzers that provides precision and scale of Spark's 'DecimalType'.
files changed/created:
Minimum.scala, Maximum.scala, Sum.scala, Mean.scala (for new analyzer implementation)
Analyzer.scala
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.