Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: upgrade Spark to 3.4, and Deequ to 2.0.5 #168

Closed
wants to merge 1 commit into from

Conversation

chenliu0831
Copy link
Contributor

Issue #, if available: #151

Description of changes:

Upgrade Spark to 3.4 and Deequ to 2.0.5

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@chenliu0831
Copy link
Contributor Author

chenliu0831 commented Oct 26, 2023

The test failures are because some new optional parameters were introduced with the new version of Deequ (e.g. the analyzerOptions). Today the Python land cannot leverage the default parameters in Scala land... so it throws an error.

If the interface have to be an exact match, the code will bifurcate (since older version of Deequ won't have this parameter) e.g. below fixed the issue for deequ 2.0.5 but broken deequ <2.0.5

-        self._Check = self._Check.hasMaxLength(column, assertion_func, hint)
+        analyzer_options = self._jvm.scala.Option.apply(None)
+        self._Check = self._Check.hasMaxLength(column, assertion_func, hint, analyzer_options)

Test failures:

E  py4j.protocol.Py4JError: An error occurred while calling o86.hasMaxLength. Trace:
E py4j.Py4JException: Method hasMaxLength([class java.lang.String, class com.sun.proxy.$Proxy35, class scala.Some, class scala.None$]) does not exist
E at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:321)
E at py4j.reflection.ReflectionEngine.getMethod(ReflectionEngine.java:329)
E at py4j.Gateway.invoke(Gateway.java:274)
E at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
E at py4j.commands.CallCommand.execute(CallCommand.java:79)
E at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
E at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
E at java.base/java.lang.Thread.run(Thread.java:829)

Edit: it seems hard even to call Scala defaults from Java... we might have to define multiple methods in Scala land without those default arguments

@chenliu0831
Copy link
Contributor Author

We have been busy with re:invent - expect some progress in Dec.

@repcaks
Copy link

repcaks commented Dec 5, 2023

Hi, is there a potential date for supporting Spark 3.4 ? :) Is it more December/January or even later ?

@machadoluiz
Copy link

machadoluiz commented Dec 8, 2023

Hello, @chenliu0831! Is there a expected date for supporting this Spark version? Or maybe 3.5?

@katiesandford
Copy link

Hi is there any update on this please?

@anqini
Copy link

anqini commented Dec 14, 2023

Hi All, i created a new pull request to accommodate spark 3.4 version and deequ later than 2.0.3. Welcome to take a look.
#178

@chenliu0831
Copy link
Contributor Author

@anqini thanks so much for looking into this and submit the PR. Unfortunately, we cannot drop the support to older Spark/Deequ version yet. I will take a closer look in #178.

All - we have discussed with Deequ team and we will be working on a longer term solution including supporting plan for older Spark versions. There's no ETAs yet (some plan in Jan) but good news is we merged the maintainer groups from both repo. I will be looking into if we can have a safe short term solution in PyDeequ only this weekend.

@dudumottavasconcelos
Copy link

Hi, @chenliu0831! Any news on this upgrade?

@MatheusXCH
Copy link

Hello all! Any news on version upgrade?

@katiesandford
Copy link

Hi. Is there any update on this please?

@chenliu0831
Copy link
Contributor Author

Closing this for now, see my comments in #192 (comment) and we can provide updates there

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants