[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created #42178

WeichenXu123 · 2024-01-04T04:35:05Z

Why are these changes needed?

Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 · 2024-01-04T08:44:05Z

python/ray/util/spark/cluster_init.py

@@ -1615,11 +1615,11 @@ def ray_cluster_job_mapper(_):
        )
        job_rdd = job_rdd.withResources(resource_profile)

-    job_rdd.mapPartitions(ray_cluster_job_mapper).collect()
-
    hook_entry = _create_hook_entry(is_global=(ray_temp_dir is None))
    hook_entry.on_spark_job_created(spark_job_group_id)


Move this line to front of job_rdd.mapPartitions(ray_cluster_job_mapper).collect() because this function runs inside background thread and job_rdd.mapPartitions(ray_cluster_job_mapper).collect() will block forever until cluster or the Ray worker node is terminated.

WeichenXu123 · 2024-01-04T08:53:18Z

CC @jjyao

jjyao · 2024-01-04T10:19:02Z

//python/ray/tests:spark/test_databricks_hook                            FAILED in 3 out of 3 in 53.9s
 ```

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

jjyao · 2024-01-08T07:11:15Z

There are conflicts.

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

…ray-project#42178) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 added 2 commits January 4, 2024 12:34

init

4d70546

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

update

b9f923b

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 commented Jan 4, 2024

View reviewed changes

jjyao approved these changes Jan 4, 2024

View reviewed changes

fix test

04eafad

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 added 2 commits January 8, 2024 16:58

update

3bffec0

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

clean

8a6c273

Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

WeichenXu123 requested a review from jjyao January 8, 2024 09:00

Merge branch 'master' into fix-DatabricksRayOnSparkStartHook

c5a74a9

jjyao approved these changes Jan 8, 2024

View reviewed changes

jjyao merged commit 2244e89 into ray-project:master Jan 8, 2024
9 checks passed

rickyyx mentioned this pull request Jan 10, 2024

Release test microbenchmark.aws failed #42292

Closed

vickytsang pushed a commit to ROCm/ray that referenced this pull request Jan 12, 2024

[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created (…

9fc7330

…ray-project#42178) Signed-off-by: Weichen Xu <weichen.xu@databricks.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created #42178

[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created #42178

WeichenXu123 commented Jan 4, 2024 •

edited

Loading

WeichenXu123 Jan 4, 2024 •

edited

Loading

WeichenXu123 commented Jan 4, 2024

jjyao commented Jan 4, 2024

jjyao commented Jan 8, 2024

[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created #42178

[spark] Fix DefaultDatabricksRayOnSparkStartHook.on_spark_job_created #42178

Conversation

WeichenXu123 commented Jan 4, 2024 • edited Loading

Why are these changes needed?

Related issue number

Checks

WeichenXu123 Jan 4, 2024 • edited Loading

Choose a reason for hiding this comment

WeichenXu123 commented Jan 4, 2024

jjyao commented Jan 4, 2024

jjyao commented Jan 8, 2024

WeichenXu123 commented Jan 4, 2024 •

edited

Loading

WeichenXu123 Jan 4, 2024 •

edited

Loading