-
Notifications
You must be signed in to change notification settings - Fork 152
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updated ClassEnquirer.java to use py4j python package and this will solve py4j error when running pyspark script #548
Conversation
…olve py4j error when running pyspark script
It's my understanding that py4j and jep are incompatible since they both expect to initialize python and the JVM in different ways so I don't understand the use case where this helps. Does this change enable you to use py4j and jep in the same process? |
yes. I was able to successfully create a pyspark session using Jep and run it in the same process. without this change you can still run it using custom ClassEnquirer:
but i think this is a bit of labor and adding py4j to RESTRICTED_PKG_NAMES solves this |
Thank you for providing an example and helping me understand the background for this change. I do not think you need a custom class enquirer in your example. The NamingConventionClassEnquirer only includes a predefined list of top level packages and since py4j is not in that list it would not be included. You should be able to run your example using the NamingConventionClassEnquirer without any extension. In that case it would only require one line of extra code to set the ClassEnquirer to solve the py4j import error. I am currently on the fence as to whether I think this change should be merged to Jep. This is not the first time pyspark has come up and it would be really nice to have that use case work with the default settings. But I am always hesitant to add code to Jep for specific use cases that we cannot test and will not officially support. The jep maintainers do not use pyspark or py4j and we would not test this compatibility in any future releases. Changes that support use cases we don't test make it hard to fix or refactor code because we won't know if we break something. I'll keep thinking over it and also see if @ndjensen has an opinion. |
I am not opposed to this changeset since it helps some of our users, but I don't want to get in the habit of hardcoding modules to be excluded. This brings back to attention that we really need a ClassEnquirer that allows you to pass it a set of modules to exclude as Java packages, even if there are Java packages with the same package name on the classpath. Or put another way, a ClassEnquirer where you can explicitly declare certain packages as python packages, so they are not attempted to be imported from Java, regardless of what's on the classpath. I would like to write this new ClassEnquirer for Jep 4.2.1. I think we should keep this pull request open a bit longer and see if I can get to that new ClassEnquirer, cause it would make this change somewhat obsolete. But if you want to merge this change anyway, I am not opposed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is worth merging this change so that spark users can have a better experience using jep.
I added #553 to make it easier to force the imports to the Python packages and not the Java packages when using ClassList. Please let me know what you think. |
can we merge this? @ndjensen @bsteffensmeier |
Merged, thanks. This will be included in the 4.2.1 release which will come out in the next month. |
this will solve following error when running pyspark script from Jep: