You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it possible to initialize the spark object from an existing SparkSession? The use case is that my work environment needs a special customized SparkSession that were wrapped up with complicated corporate credentials and setups. Running init_spark() from the raydp example won't work as it is not aware of them. I can create a SparkSession object using the customized wrapper though but don't know how I can pass it over to raydp.
The raydp example using standard spark:
import ray
import raydp
# connect to ray cluster
ray.init(address='auto')
# create a Spark cluster with specified resource requirements
spark = raydp.init_spark(app_name='RayDP Example',
num_executors=2,
executor_cores=2,
executor_memory='4GB')
# normal data processesing with Spark
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
df.show()
word_count = df.groupBy('word').count()
word_count.show()
# stop the spark cluster
raydp.stop_spark()
Is it possible to initialize the spark object from an existing SparkSession? The use case is that my work environment needs a special customized SparkSession that were wrapped up with complicated corporate credentials and setups. Running
init_spark()
from the raydp example won't work as it is not aware of them. I can create a SparkSession object using the customized wrapper though but don't know how I can pass it over to raydp.The raydp example using standard spark:
Proposed raydp using existing SparkSession:
The text was updated successfully, but these errors were encountered: