You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to run sample python code inside Scala code using Jep. In my python code i am simply creating SparkSession object via "SparkSession.builder.appName('name').master('local[1]').getOrCreate()" and executing this python code via Jep using SubInterpreter. I have also added pyspark as shared module in JepConfig to be used while creating SubInterpreter instance. My entire scala python code looks like below
But still when i run the above scala code i am getting segmentation fault error
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x000000012d633bc9, pid=92789, tid=0x0000000000001603
#
# JRE version: OpenJDK Runtime Environment (8.0_275-b01) (build 1.8.0_275-b01)
# Java VM: OpenJDK 64-Bit Server VM (25.275-b01 mixed mode bsd-amd64 compressed oops)
# Problematic frame:
# C [Python+0x7bbc9] PyModule_GetState+0x9
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
# /Users/bhupeshgoel/Documents/codebase/prophecy/hs_err_pid92789.log
#
# If you would like to submit a bug report, please visit:
# https://github.com/AdoptOpenJDK/openjdk-support/issues
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
Detailed error report file is also attached with this ticket.
I wanted to know if pyspark is supported with Jep specially when running pyspark code inside scala/java code? I was able to execute and create instance of SparkSession in Jep interactive session
Bhupeshs-MacBook-Pro:~ bhupeshgoel$ jep
>>> import pyspark
>>> from pyspark.sql import *
>>> from pyspark.sql.functions import *
>>> spark = SparkSession.builder.appName('name').master("local[1]").getOrCreate()
21/04/28 13:03:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
>>> lit(1)
Column<b'1'>
Other Environment Details are
OS Platform, Distribution, and Version: MacOS Catalina v10.15.7
Python Distribution and Version: python3.9
Java Distribution and Version: OpenJDK 1.8
Jep Version: 3.9.1
Python packages used (e.g. numpy, pandas, tensorflow): pyspark v3.1.1
It is unusual that it works in the jep interactive session but not in your application. The most major difference is that the interactive session is using a SharedInterpreter, Have you tried using a SharedInterpreter instead of a SubInterpreter?
when i simply switch to SharedInterpreter then i get below error. I haven't changed any other environment variable and same environment setup was used with SubInterpreter.
<class 'ModuleNotFoundError'>: No module named 'py4j.protocol'
jep.JepException: <class 'ModuleNotFoundError'>: No module named 'py4j.protocol'
at /usr/local/Cellar/apache-spark/3.0.1/libexec/python/pyspark/context.<module>(context.py:27)
at /usr/local/Cellar/apache-spark/3.0.1/libexec/python/pyspark/__init__.<module>(__init__.py:51)
at <string>.<module>(<string>:5)
at jep.Jep.exec(Native Method)
at jep.Jep.exec(Jep.java:478)
I am trying to run sample python code inside Scala code using Jep. In my python code i am simply creating SparkSession object via "SparkSession.builder.appName('name').master('local[1]').getOrCreate()" and executing this python code via Jep using SubInterpreter. I have also added pyspark as shared module in JepConfig to be used while creating SubInterpreter instance. My entire scala python code looks like below
I am passing necessary environment variables as mentioned below as well as jep.jar is in classpath
But still when i run the above scala code i am getting segmentation fault error
Detailed error report file is also attached with this ticket.
I wanted to know if pyspark is supported with Jep specially when running pyspark code inside scala/java code? I was able to execute and create instance of SparkSession in Jep interactive session
Other Environment Details are
hs_err_pid92789.log
The text was updated successfully, but these errors were encountered: