Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Features and enhancements done #299

Open
wants to merge 14 commits into
base: master
Choose a base branch
from
Open

Conversation

Yash0215
Copy link

New Features:
1.
A Date Time Distribution analyzer for analyzing the distribution of the records based on 'DateType' or 'TimestampType' feature within fixed time intervals.
files changed/created:
DateTimeDistribution.scala
DateTimeAggregation.scala
DeequFunctions.scala
...

6 new Constraints added covering more use cases for DateTime quality checks:
files changed/created:
Check.scala
Constraint.scala

Constraint 'isContainedIn' is now supports more Scala Numeric Types
files changed/created:
Check.scala

Enahancements:

  1. Issue: Timestamp support Timestamp support #47
    New State and Metric is implemented for this enhancement since previous analyzer only support Double Metric and Standard analyzer. a new abstract analyzer for timestamp analysis is implemented.
    files changed/created:
    MinimumDateTime.scala, MaximumDateTime.scala (for new analyzer implementation)
    Analyzer.scala

  2. Analyzer for Precision and Scale of BigDecimals Analyzer for Precision and Scale of BigDecimals #46
    New State and Metric is implemented. Also new analyzers that provides precision and scale of Spark's 'DecimalType'.
    files changed/created:
    Minimum.scala, Maximum.scala, Sum.scala, Mean.scala (for new analyzer implementation)
    Analyzer.scala

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

Yash0215 and others added 9 commits September 23, 2020 12:46
Features:
> Support Spark's DateType and TimestampType
> min and max analysis on date time fields
> New constraints for Spark's DateType and Timestamp Types
> BigDecimal State and Metric Support in Min, Max, Sum and Mean Analyzers
> 'isContained' is made available for all Numeric Types and Char
> Analyzer - DateTimeDistribution
  for analyzing distributions metrics (count, ratio) over some time intervals
@sscdotopen
Copy link
Contributor

Thanks! Could you look at the CI failures and fix them?

@lucene
Copy link

lucene commented Oct 15, 2020

Hi, is this feature now available?

@Yash0215
Copy link
Author

Thanks! Could you look at the CI failures and fix them?

sorry for the late reply. Ill work on it.

@lucene
Copy link

lucene commented Oct 15, 2020

This is valuable feature you've added @Yash0215. awesome. Looking forward to using it.

@lucene
Copy link

lucene commented Oct 15, 2020

Can I assist in anyway @yash021? I'm looking to utilise this functionality asap.

@Yash0215
Copy link
Author

please recommend if any changes are needed.

@aviatesk
Copy link
Contributor

So the two enhancements implemented in this PR are essentially orthogonal, right ? If so I'd recommend we split them into separate PRs and focus on each.

@sscdotopen
Copy link
Contributor

Yes, that would make a lot of sense.

@rounakdatta
Copy link

Hello @Yash0215 @sscdotopen, were there furthur developments on this work? If not, I can volunteer to take it forward.

@aviatesk
Copy link
Contributor

FWIW, I forked this project and am actively developing it afterwards.
See this changelog to see the enhancements/bugfixes.

I'm also thinking of announcing it as an active fork of this project somewhere (maybe as an issue for this repo) in the near future.

@twollnik
Copy link
Contributor

Hi, thanks so much for introducing all these changes. Unfortunately, we currently don't have availability to give this a proper review. Will keep this PR in the backlog for now. If you have the opportunity to submit a couple of smaller reviews that would be great. It's hard to find the time to do big reviews and a few smaller PRs could help us understand the main ideas and make progress on this.

@twollnik
Copy link
Contributor

@Yash0215 Please get back to us on this if you get the chance. We are considering closing this PR soon.

@RunnX
Copy link

RunnX commented Jun 14, 2022

@twollnik @Yash0215 - Will this PR be merged to improve deequ to handle timestamp/date support?

@shehzad-qureshi shehzad-qureshi added the enhancement New feature or request label Jan 31, 2023
@shehzad-qureshi shehzad-qureshi added the help wanted Extra attention is needed label Jan 31, 2023
@suadhika
Copy link

Any update on this? to improve deequ for handling of timestamp/date support?

@zeotuan
Copy link
Contributor

zeotuan commented Apr 18, 2024

This PR is quite big with multiple unrelated changes making review hard. Chunk this into multiple smaller PR should be a good start. I am interested in timestamp/date support

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.