[data] [streaming] Fixes to autoscaling actor pool streaming op #32023

ericl · 2023-01-28T03:40:00Z

Signed-off-by: Eric Liang ekhliang@gmail.com

Why are these changes needed?

Fixes:

Properly wire max tasks per actor to pool
Account for internal queue size in scheduling algorithm
Small improvements to progress bar UX

TODO:

Improve unit tests

Signed-off-by: Eric Liang <ekhliang@gmail.com>

python/ray/data/_internal/execution/operators/actor_pool_map_operator.py

clarkzinzow

LGTM overall, just a few questions about the tests

clarkzinzow · 2023-01-30T16:46:13Z

python/ray/data/tests/test_streaming_executor.py

@@ -68,7 +52,7 @@ def test_build_streaming_topology(ray_start_10_cpus_shared):
    assert list(topo) == [o1, o2, o3]


-def test_disallow_non_unique_operators(ray_start_10_cpus_shared):
+def test_disallow_non_unique_operators():


Why is removing the ray_start_10_cpus_shared fixture necessary? Now the first test will implicitly start a cluster whose # of CPUs/workers will be dependent on the machine that its running on, and that cluster will be implicitly used for the rest of the tests in the module. If possible, we should really use fixtures that set the exact number of CPUs and explicitly manage the test-lifecycle of the clusters to make these tests more deterministic across machines and refactorings.

I split this test file into pure unit vs integration tests. Hence, they shouldn't depend on Ray and we shouldn't need a Ray fixture.

In general, it seems strange to use a fixture we don't need. We should either fix the fixture or split the tests into separate files.

Ah I see the intention! But these tests will still need to start a Ray cluster because of the ray.puts for the input ref bundles and putting the MapOperator transform function into the object store, right? There just won't be any tasks launched.

Ah that's true. Maybe that was what was causing mysterious issues with the pipeline hang before? I also saw that sometimes, but it went away with the test split.

In any case, hopefully the puts will go away with the new logical backend.

python/ray/data/tests/test_streaming_integration.py

clarkzinzow · 2023-01-30T16:54:40Z

python/ray/data/tests/test_streaming_integration.py

+    with pytest.raises(ray.exceptions.RayTaskError):
+        ray.data.range(6, parallelism=6).map(
+            barrier3, compute=ray.data.ActorPoolStrategy(1, 2)
+        ).take_all()


Signed-off-by: Eric Liang <ekhliang@gmail.com>

clarkzinzow

Not going to block on the testing nits

ericl

Updated.

Signed-off-by: Eric Liang <ekhliang@gmail.com>

…project#32023) Fixes: - Properly wire max tasks per actor to pool - Account for internal queue size in scheduling algorithm - Small improvements to progress bar UX

…project#32023) Fixes: - Properly wire max tasks per actor to pool - Account for internal queue size in scheduling algorithm - Small improvements to progress bar UX Signed-off-by: Edward Oakes <ed.nmi.oakes@gmail.com>

fix wiring

697e674

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl assigned c21 Jan 28, 2023

ericl requested review from scv119, clarkzinzow, jjyao, jianoaix and c21 as code owners January 28, 2023 03:40

ericl assigned clarkzinzow Jan 28, 2023

ericl added 2 commits January 27, 2023 19:45

update

2d6fcbb

Signed-off-by: Eric Liang <ekhliang@gmail.com>

fix queue size reporting for actor pool ops

7d35983

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl changed the title ~~[data] [streaming] Max tasks in flight arg not passed to autoscaling policy~~ [WIP] [data] [streaming] Fixes to autoscaling actor pool streaming op Jan 28, 2023

ericl added 2 commits January 27, 2023 22:22

add e2e autoscaling sanity checks

d15db36

update tests

53c712b

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl changed the title ~~[WIP] [data] [streaming] Fixes to autoscaling actor pool streaming op~~ [data] [streaming] Fixes to autoscaling actor pool streaming op Jan 28, 2023

rename for clarity

6f80d4d

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl assigned stephanie-wang and jianoaix Jan 28, 2023

ericl added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 28, 2023

clarkzinzow reviewed Jan 30, 2023

View reviewed changes

python/ray/data/_internal/execution/operators/actor_pool_map_operator.py Show resolved Hide resolved

clarkzinzow reviewed Jan 30, 2023

View reviewed changes

ericl added 2 commits January 30, 2023 11:50

wip

4b0006c

Signed-off-by: Eric Liang <ekhliang@gmail.com>

context manager

1899c63

Signed-off-by: Eric Liang <ekhliang@gmail.com>

clarkzinzow approved these changes Jan 30, 2023

View reviewed changes

ericl commented Jan 30, 2023

View reviewed changes

ericl removed the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jan 30, 2023

fix test

03a5e21

Signed-off-by: Eric Liang <ekhliang@gmail.com>

ericl merged commit 96440cf into ray-project:master Jan 30, 2023

clarkzinzow mentioned this pull request Jan 30, 2023

[Ray 2.3 release] air_benchmark_xgboost_cpu_10 fails #32068

Closed

zhe-thoughts linked an issue Jan 30, 2023 that may be closed by this pull request

[Ray 2.3 release] air_benchmark_xgboost_cpu_10 fails #32068

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[data] [streaming] Fixes to autoscaling actor pool streaming op #32023

[data] [streaming] Fixes to autoscaling actor pool streaming op #32023

ericl commented Jan 28, 2023 •

edited

Loading

clarkzinzow left a comment

clarkzinzow Jan 30, 2023

ericl Jan 30, 2023

clarkzinzow Jan 30, 2023

ericl Jan 30, 2023

clarkzinzow Jan 30, 2023

clarkzinzow left a comment

ericl left a comment

[data] [streaming] Fixes to autoscaling actor pool streaming op #32023

[data] [streaming] Fixes to autoscaling actor pool streaming op #32023

Conversation

ericl commented Jan 28, 2023 • edited Loading

Why are these changes needed?

clarkzinzow left a comment

Choose a reason for hiding this comment

clarkzinzow Jan 30, 2023

Choose a reason for hiding this comment

ericl Jan 30, 2023

Choose a reason for hiding this comment

clarkzinzow Jan 30, 2023

Choose a reason for hiding this comment

ericl Jan 30, 2023

Choose a reason for hiding this comment

clarkzinzow Jan 30, 2023

Choose a reason for hiding this comment

clarkzinzow left a comment

Choose a reason for hiding this comment

ericl left a comment

Choose a reason for hiding this comment

ericl commented Jan 28, 2023 •

edited

Loading