You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Current documentation on using RayDP with Ray Client only syas: "RayDP works the same way when using ray client. However, spark driver would be on the local machine." It would be very helpful to have at least one example of how it should be configured and used, because implementing with Ray v2.1.0 and RayDP v1.5.0 (using Azure Kubernetes Service as backend) something straight forward as: ray.init(address="ray://raycluster-kuberay-head-svc.default.svc.cluster.local:10001")
spark = raydp.init_spark(app_name='RayDP Example',
num_executors=1,
executor_cores=1,
executor_memory='500M')
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
Initializes Ray Client, RayDP/PySpark and creates the dataset without errors, but then
df.show()
creates an endless stream of
[Stage 0:> (0 + 0) / 1]
2023-06-04 22:18:13,383 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:28,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:43,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Although an approach with wrapping "init_spark" into a ray actor works fine.
Any advice or example to run remotely with Ray CLient will be highly appreciated.
Thank you.
The text was updated successfully, but these errors were encountered:
hi,
I was not able to reproduce this issue in my environment. Maybe it's due to the network. As we said in the document, in ray client mode, spark executors will be in the ray cluster, but spark driver will be on the local machine where the script is run. Can that spark driver connect to those executors? Can you inspect the java-worker-*.log in /tmp/ray/session_latest/logs/?
#299 This issue might be related. Are you using Mac for that local machine?
Thanks. Right, It's indeed the networking issue (between local spark driver and remote executor). The question is sooner -what are the network requirements to make driver-executor work fine (open ports, something else?) Thank you. P.S. By the way in this particular case both driver and executor are in the same k8s cluster, but different pods and namespaces.
Current documentation on using RayDP with Ray Client only syas: "RayDP works the same way when using ray client. However, spark driver would be on the local machine." It would be very helpful to have at least one example of how it should be configured and used, because implementing with Ray v2.1.0 and RayDP v1.5.0 (using Azure Kubernetes Service as backend) something straight forward as:
ray.init(address="ray://raycluster-kuberay-head-svc.default.svc.cluster.local:10001")
spark = raydp.init_spark(app_name='RayDP Example',
num_executors=1,
executor_cores=1,
executor_memory='500M')
df = spark.createDataFrame([('look',), ('spark',), ('tutorial',), ('spark',), ('look', ), ('python', )], ['word'])
Initializes Ray Client, RayDP/PySpark and creates the dataset without errors, but then
df.show()
creates an endless stream of
[Stage 0:> (0 + 0) / 1]
2023-06-04 22:18:13,383 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:28,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
2023-06-04 22:18:43,382 WARN TaskSchedulerImpl [task-starvation-timer]: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient resources
Although an approach with wrapping "init_spark" into a ray actor works fine.
Any advice or example to run remotely with Ray CLient will be highly appreciated.
Thank you.
The text was updated successfully, but these errors were encountered: