-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Java gateway process exited before sending its port number during init_spark() #383
Comments
Hi @swapkh91 , |
Sorry for inconvenience. We only tested java 8. Java 11 should be fine though. |
@kira-lin I tested it with ray 2.6.2, getting same error. I'll explain how I'm trying to connect, maybe some issue in the process the ray cluster is on GKE I then connect using
Now this I checked the logs through dashboard
Why is it showing the jar file path of my laptop? It is present there though, I checked |
Oops, this seems a bug. We'll try to fix this. For now, you can wrap this init_spark and things you want to do with spark in an remote actor, that should be fine. Thanks for identifying this bug. |
@kira-lin got it, I'll try that. Also, I noticed that raydp has dependency |
He @swapkh91 I am also getting the same error. Did you find the solution for this? |
hi @raiprabh ,
You can try this solution. We don't have enough bandwidth to work on this project now, so you are welcome to submit a PR to fix this if you have a solution @swapkh91 . We just need to use the path of the remote machines. |
@kira-lin, Is there any update on this issue? |
I also get this error when running the following code: if __name__ == "__main__":
import ray
import raydp
ray.init(
address="ray://localhost:10001"
)
spark = ray.remote(
raydp.init_spark("NYCTAXI data processing",
num_executors=2,
executor_cores=1,
executor_memory="500M",
configs={"spark.shuffle.service.enabled": "true"})
)
data = ray.remote(
spark.read.format("csv") \
.option("header", "true") \
.option("inferSchema", "true") \
.load(NYC_TRAIN_CSV)
) Seems that wrapping the functions into |
The following worked for me: import time
import ray
import raydp
import pandas as pd
@ray.remote
class PySparkDriver:
def __init__(self):
self.spark = raydp.init_spark("RayDP Example",
num_executors=2,
executor_cores=1,
executor_memory="1GB")
def foo(self):
return self.spark.range(1000).repartition(10).count()
if __name__ == "__main__":
ray.init(
address="ray://localhost:10001"
)
driver = PySparkDriver.remote()
print(ray.get(driver.foo.remote())) |
I'm trying a test using raydp. I have setup Ray Cluster on GKE using the below dockerfile
I have port forwarded the gke pod and I'm able to connect to it using
ray.init(address="ray://localhost:10001")
When i try to connect raydp through
I get the following error
Exception: Java gateway process exited before sending its port number
Full stacktrace
Libraries on my laptop:
The text was updated successfully, but these errors were encountered: