Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug][Dependency][Blocked]Support Scala Deequ >= 2.0.4 #169

Closed
chenliu0831 opened this issue Oct 26, 2023 · 5 comments
Closed

[Bug][Dependency][Blocked]Support Scala Deequ >= 2.0.4 #169

chenliu0831 opened this issue Oct 26, 2023 · 5 comments
Assignees
Labels

Comments

@chenliu0831
Copy link
Contributor

chenliu0831 commented Oct 26, 2023

Describe the bug

Since Deequ >2.0.3, there are many interface changes that introduce an optional parameter in Scala land. While this might look ok for Scala users, for Java/Python users this will cause issues because the interface signature has to be exact match (in other words, Scala must add another function as overload to be backward compatible for Python/Java users). See the follow issues:

Workaround:
Use Deequ 2.0.3, or avoid using any of the known broken APIs.

Broken APIs

  • Compliance
  • Histogram
  • MaxLength/MinLength
  • hasMaxLength/hasMinLength
  • hasMutualInformation
  • Check.satisfies

Solution:

We need to define/overload function in Deequ whenever we extend the interface with optional parameters. This might not be the best solution. We will discuss with the Deequ folks for alternative solutions.

@chenliu0831 chenliu0831 self-assigned this Oct 26, 2023
@chenliu0831 chenliu0831 changed the title [Bug][Dependency]Support Deequ 2.0.4 [Bug][Dependency]Support Scala Deequ > 2.0.4 Oct 26, 2023
@chenliu0831 chenliu0831 pinned this issue Oct 26, 2023
@chenliu0831 chenliu0831 changed the title [Bug][Dependency]Support Scala Deequ > 2.0.4 [Bug][Dependency]Support Scala Deequ >= 2.0.4 Oct 26, 2023
@stolikparanoik
Copy link

stolikparanoik commented Nov 6, 2023

SPARK_TO_DEEQU_COORD_MAPPING = {
"3.3": "com.amazon.deequ:deequ:2.0.3-spark-3.3",
"3.2": "com.amazon.deequ:deequ:2.0.1-spark-3.2",
"3.1": "com.amazon.deequ:deequ:2.0.0-spark-3.1",
"3.0": "com.amazon.deequ:deequ:1.2.2-spark-3.0",
"2.4": "com.amazon.deequ:deequ:1.1.0_spark-2.4-scala-2.11",
}
do i understand correctly, that if I have SPARK_VERSION equal to 3.3, the lib should use deequ 2.0.3 and all the functions should work fine?
If that's so I'm just a little confused since that's the current version of the code? are there other changes that should be made?

@chenliu0831
Copy link
Contributor Author

chenliu0831 commented Nov 7, 2023

@stolikparanoik yes it would work if you did not bring your own deequ jars. For production usage, many people do bring their own jars (if you look at the issues reported, they are all using deequ > 2.0.3).

@devjoshi58
Copy link

devjoshi58 commented Nov 9, 2023

Got the below issue and fix mentioned at the bottom-

Hello Team,

I am testing the pydeequ-1.0.1 with Spark 3.3.

Able to test the Analyzer and Verification check but the constraintsuggestion fails with below error-
raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling o99.run.
: com.amazon.deequ.analyzers.runners.MetricCalculationRuntimeException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 7 in stage 19.0 failed 1 times, most recent failure: Lost task 7.0 in stage 19.0 (TID 62) (10.0.0.79 executor driver): java.lang.NoSuchMethodError: '

This got fixed when I changed the pydeequ version to 1.1.0 and pyspark to 3.3

@chenliu0831 chenliu0831 changed the title [Bug][Dependency]Support Scala Deequ >= 2.0.4 [Bug][Dependency][Blocked]Support Scala Deequ >= 2.0.4 Nov 30, 2023
@chenliu0831
Copy link
Contributor Author

We have discussed with Deequ team and we will be working on a longer term solution including supporting plan for older Spark versions. There's no ETAs yet but good news is we now bring the Deequ maintainers (deequ-dev) to this repo as well. I will be looking into if we can have a safe short term solution in PyDeequ this weekend.

@chenliu0831
Copy link
Contributor Author

Resolved by #196.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants