Less cluster memory #338

fjetter · 2022-09-16T08:06:24Z

This will run the array workloads that accept cluster memory as an input parameter parametrized for a low-memory and high-memory situation.

The tests themselves sometimes still multiply this value but I didn't want to modify test logic. Further down the road, it will surely be an interesting exercise to increase this value to >1.0 as well but for now we're mostly concerned getting the low mem cases right.

For future benchmark development I believe the pattern of adjusting used data size as a function of cluster memory is a good practice that allows us scaling these tests very easily

fjetter · 2022-09-16T09:36:10Z

Failure on the win stability test https://github.com/coiled/platform/issues/91

fjetter · 2022-09-16T10:35:41Z

Failure on ubu 3.9 upstream is #336

I have no idea how this change could introduce regressions. The one test I touched that is flagged as a regression is test_dataframe_align but the regression is detected for coiled-runtime 0.0.4

I am surprised that this even kicks in. Given the parametrization I'm wondering how the script correlates the new changes with the old ones. Any idea @ian-r-rose ?

fjetter · 2022-09-16T11:32:07Z

The visualization on this is a bit unfortunate. Could be fixed with a DB migration but not sure if that's worth it

fjetter · 2022-09-16T11:47:01Z

alembic/versions/2f28d2536c90_parametrize_over_cluster_memory.py

+def upgrade() -> None:
+    op.execute(
+    """update test_run
+    set name = name || '[100% cluster memory]'


@ian-r-rose I'm not entirely sure what name and orignalname is. is it correct for me to just modify name?

originalname is the name of the actual function, whereas name is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.

ian-r-rose · 2022-09-16T16:03:46Z

alembic/versions/2f28d2536c90_parametrize_over_cluster_memory.py

+def upgrade() -> None:
+    op.execute(
+    """update test_run
+    set name = name || '[100% cluster memory]'


originalname is the name of the actual function, whereas name is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.

ian-r-rose · 2022-09-16T16:05:52Z

conftest.py

+def _cluster_memory(client: distributed.Client) -> int:
+    "Total memory available on the cluster, in bytes"
+    return int(
+        sum(w["memory_limit"] for w in client.scheduler_info()["workers"].values())
+    )


Seems unfortunate to delete the utility function such that it's inaccessible to tests who might want to do something different with the cluster memory measurement.

right, I'll just rename it to get_cluster_memory

fjetter · 2022-09-16T16:47:48Z

"proof" that the migration script works

ian-r-rose · 2022-09-16T16:48:23Z

I am surprised that this even kicks in. Given the parametrization I'm wondering how the script correlates the new changes with the old ones. Any idea @ian-r-rose ?

It looks like you've worked this out, but for posterity, the correlation with older runs is based on matching the name (not originalname, which does not include parameterizations) and the runtime version.

ian-r-rose · 2022-09-16T16:48:38Z

"proof" that the migration script works

Nice

fjetter · 2022-09-16T16:51:03Z

conftest.py

+
+@pytest.fixture(
+    params=[
+        pytest.param(0.3, id="30% cluster memory"),


30% is just a guess. I'm looking for stuff that "comfortably fits into memory". I figured that if a test actually generates that much data, we could hold two full copies before starting to spill.

This will be individual for every test though since most tests still multiply the cluster-memory by a certain factor etc.

fjetter · 2022-09-16T16:53:50Z

@ian-r-rose (or anybody else) if you are happy with this, please go ahead and merge since I'm likely already out before CI finishes but I think we should get this in before the weekend

jrbourbeau

I've only taken a quick look at this -- but I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory" 👍

gjoseph92

I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory"

I like this idea as well, but are we sure it's valuable to do automatically on every case? I appreciate setting something up like this so we can use it conveniently in the future, but does it add a valuable enough signal right now to be worth increasing test runtime this much? I could see merging this, but on main only having a single parametrization of memory_factor.

gjoseph92 · 2022-10-04T01:45:41Z

tests/benchmarks/test_array.py

    # From https://github.com/dask/distributed/issues/2602#issuecomment-535009454
+    if cluster_memory == 1.0:


Why would it ever be 1.0? The actual cluster memory would have to be 1 byte for that to happen. I assume you want to detect the not-30% case, but because you only have access to the (faked) size in bytes, not the factor, that's tricky to do. (See above for discussion.)

I think this originates from an earlier iteration. Will need to fix that

gjoseph92 · 2022-10-04T01:47:15Z

tests/benchmarks/test_array.py

    data = da.random.random(
        scaled_array_shape(target_nbytes, ("x", "10MiB")),
        chunks=(1, parse_bytes("10MiB") // 8),
    )
-    print_size_info(memory, target_nbytes, data)
+    print_size_info(cluster_memory, target_nbytes, data)


nit: I don't love implementing this by lying to the test about the amount of cluster memory. It makes this logging of size info wrong, for one. It also means that tests that do want to adapt to cluster memory, but don't make sense to run on a variety of parameterizations (maybe they're meant to test spilling, or they know they won't work on larger datasets [see comment below]), are difficult to write.

Instead, why not have memory_factor be the fixture, and multiply within the test when appropriate? A little more code, but also more explicit and readable.

hayesgb

My only question here is whether it makes sense to use the same data sizes for all of the array tests.

Was there a specific reason the designated multiples were chosen for each test. I think @gjoseph92 may have better context. See #410 as alternative.

gjoseph92 · 2022-10-04T21:10:01Z

Was there a specific reason the designated multiples were chosen for each test

Absolutely. These multiples are are based on actual workloads reported by actual users relative to the size of their actual cluster.

As an example, test_anom_mean came from dask/distributed#2602 (comment), which was sometimes unrunnable using a 75 GiB dataset on a 200GiB cluster (see dask/distributed#2602 (comment)). Increasing cluster size significantly would make it runnable. So this test was tuned to trigger the real-world case of this workload with data ~half the size of the cluster, which presumably used to trigger spilling and then failure. Making the data significantly smaller means we'd expect it to finish even if scheduling was horrendous. Making the data significantly bigger means I'm not sure we could reasonably expect it to be runnable.

So adding a multiple to every test's data size might kinda make sense, but we shouldn't just change how all the data sizes are already set up relative to cluster memory—just apply the multiple at the end.

fjetter requested review from gjoseph92, hendrikmakait and hayesgb September 16, 2022 08:06

fjetter commented Sep 16, 2022

View reviewed changes

fjetter mentioned this pull request Sep 16, 2022

Integration test that focus on AMM informing our decision to toggle this on by default #140

Closed

ian-r-rose reviewed Sep 16, 2022

View reviewed changes

fjetter commented Sep 16, 2022

View reviewed changes

ian-r-rose approved these changes Sep 16, 2022

View reviewed changes

fjetter force-pushed the less_cluster_memory branch from e94a94c to e1668e5 Compare October 3, 2022 09:08

jrbourbeau reviewed Oct 3, 2022

View reviewed changes

gjoseph92 reviewed Oct 4, 2022

View reviewed changes

hayesgb mentioned this pull request Oct 4, 2022

Param array test ddiff #410

Open

hayesgb reviewed Oct 4, 2022

View reviewed changes

fjetter added 9 commits October 7, 2022 16:27

Parametrize array tests over percentage of cluster memory

b934a4b

Run climatic mean for low mem case

e2bfb8a

Add cluster_memory fixture to test_dataframe as well

6cc2531

Add migration script for test names

b26ff6e

Only use 30%

b8b4c18

change _cluster_memory to get_cluster_memory

0d8b961

Move get_cluster_memory to utils_test

96da178

Parametrize over memory_multiplier instead

4343f84

Skip 0.3

1d24920

Use integer for target_nbytes

56fe27f

fjetter force-pushed the less_cluster_memory branch from f4e2bad to 56fe27f Compare October 7, 2022 14:34

skip climatic mean 0.3

4f883eb

fjetter closed this Jan 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less cluster memory #338

Less cluster memory #338

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

fjetter Sep 16, 2022

ian-r-rose Sep 16, 2022

ian-r-rose Sep 16, 2022

ian-r-rose Sep 16, 2022

fjetter Sep 16, 2022

fjetter commented Sep 16, 2022

ian-r-rose commented Sep 16, 2022

ian-r-rose commented Sep 16, 2022

fjetter Sep 16, 2022

fjetter commented Sep 16, 2022

jrbourbeau left a comment

gjoseph92 left a comment

gjoseph92 Oct 4, 2022

fjetter Oct 4, 2022

gjoseph92 Oct 4, 2022

hayesgb Oct 4, 2022

hayesgb left a comment

gjoseph92 commented Oct 4, 2022

		# From https://github.com/dask/distributed/issues/2602#issuecomment-535009454
		if cluster_memory == 1.0:

Less cluster memory #338

Less cluster memory #338

Conversation

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

fjetter commented Sep 16, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fjetter commented Sep 16, 2022

ian-r-rose commented Sep 16, 2022

ian-r-rose commented Sep 16, 2022

Choose a reason for hiding this comment

fjetter commented Sep 16, 2022

jrbourbeau left a comment

Choose a reason for hiding this comment

gjoseph92 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hayesgb left a comment

Choose a reason for hiding this comment

gjoseph92 commented Oct 4, 2022