Issues with running on Ray Client #352

wctmanager · 2023-06-05T11:20:55Z

Current documentation on using RayDP with Ray Client only syas: "RayDP works the same way when using ray client. However, spark driver would be on the local machine." It would be very helpful to have at least one example of how it should be configured and used, because implementing with Ray v2.1.0 and RayDP v1.5.0 (using Azure Kubernetes Service as backend) something straight forward as:
ray.init(address="ray://raycluster-kuberay-head-svc.default.svc.cluster.local:10001")
spark = raydp.init_spark(app_name='RayDP Example',
num_executors=1,
executor_cores=1,
executor_memory='500M')
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])

Initializes Ray Client, RayDP/PySpark and creates the dataset without errors, but then

df.show()

creates an endless stream of

[Stage 0:> (0 + 0) / 1]
2023-06-04 22:18:13,383 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:28,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:43,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources

Although an approach with wrapping "init_spark" into a ray actor works fine.
Any advice or example to run remotely with Ray CLient will be highly appreciated.
Thank you.

kira-lin · 2023-06-06T02:24:40Z

hi,
I was not able to reproduce this issue in my environment. Maybe it's due to the network. As we said in the document, in ray client mode, spark executors will be in the ray cluster, but spark driver will be on the local machine where the script is run. Can that spark driver connect to those executors? Can you inspect the java-worker-*.log in /tmp/ray/session_latest/logs/?

#299 This issue might be related. Are you using Mac for that local machine?

wctmanager · 2023-06-08T17:31:18Z

Thanks. Right, It's indeed the networking issue (between local spark driver and remote executor). The question is sooner -what are the network requirements to make driver-executor work fine (open ports, something else?) Thank you. P.S. By the way in this particular case both driver and executor are in the same k8s cluster, but different pods and namespaces.

kira-lin · 2023-06-09T01:49:44Z

I see. The driver node should have access to all ports on the executor nodes. I think this is enough

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues with running on Ray Client #352

Issues with running on Ray Client #352

wctmanager commented Jun 5, 2023

kira-lin commented Jun 6, 2023

wctmanager commented Jun 8, 2023 •

edited

Loading

kira-lin commented Jun 9, 2023

Issues with running on Ray Client #352

Issues with running on Ray Client #352

Comments

wctmanager commented Jun 5, 2023

kira-lin commented Jun 6, 2023

wctmanager commented Jun 8, 2023 • edited Loading

kira-lin commented Jun 9, 2023

wctmanager commented Jun 8, 2023 •

edited

Loading