-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less cluster memory #338
Less cluster memory #338
Conversation
Failure on the win stability test https://github.com/coiled/platform/issues/91 |
Failure on ubu 3.9 upstream is #336 I have no idea how this change could introduce regressions. The one test I touched that is flagged as a regression is I am surprised that this even kicks in. Given the parametrization I'm wondering how the script correlates the new changes with the old ones. Any idea @ian-r-rose ? |
def upgrade() -> None: | ||
op.execute( | ||
"""update test_run | ||
set name = name || '[100% cluster memory]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ian-r-rose I'm not entirely sure what name
and orignalname
is. is it correct for me to just modify name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originalname
is the name of the actual function, whereas name
is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.
def upgrade() -> None: | ||
op.execute( | ||
"""update test_run | ||
set name = name || '[100% cluster memory]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originalname
is the name of the actual function, whereas name
is the full name with any parametrizations tacked on (this is pytest terminology). So the structure of this migration looks correct to me.
conftest.py
Outdated
def _cluster_memory(client: distributed.Client) -> int: | ||
"Total memory available on the cluster, in bytes" | ||
return int( | ||
sum(w["memory_limit"] for w in client.scheduler_info()["workers"].values()) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems unfortunate to delete the utility function such that it's inaccessible to tests who might want to do something different with the cluster memory measurement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, I'll just rename it to get_cluster_memory
It looks like you've worked this out, but for posterity, the correlation with older runs is based on matching the |
conftest.py
Outdated
|
||
@pytest.fixture( | ||
params=[ | ||
pytest.param(0.3, id="30% cluster memory"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
30% is just a guess. I'm looking for stuff that "comfortably fits into memory". I figured that if a test actually generates that much data, we could hold two full copies before starting to spill.
This will be individual for every test though since most tests still multiply the cluster-memory by a certain factor etc.
@ian-r-rose (or anybody else) if you are happy with this, please go ahead and merge since I'm likely already out before CI finishes but I think we should get this in before the weekend |
e94a94c
to
e1668e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've only taken a quick look at this -- but I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory" 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of quickly being able to say "run this test with small, large, etc. datasets compared to the cluster memory"
I like this idea as well, but are we sure it's valuable to do automatically on every case? I appreciate setting something up like this so we can use it conveniently in the future, but does it add a valuable enough signal right now to be worth increasing test runtime this much? I could see merging this, but on main only having a single parametrization of memory_factor
.
tests/benchmarks/test_array.py
Outdated
# From https://github.com/dask/distributed/issues/2602#issuecomment-535009454 | ||
if cluster_memory == 1.0: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would it ever be 1.0? The actual cluster memory would have to be 1 byte for that to happen. I assume you want to detect the not-30% case, but because you only have access to the (faked) size in bytes, not the factor, that's tricky to do. (See above for discussion.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this originates from an earlier iteration. Will need to fix that
data = da.random.random( | ||
scaled_array_shape(target_nbytes, ("x", "10MiB")), | ||
chunks=(1, parse_bytes("10MiB") // 8), | ||
) | ||
print_size_info(memory, target_nbytes, data) | ||
print_size_info(cluster_memory, target_nbytes, data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I don't love implementing this by lying to the test about the amount of cluster memory. It makes this logging of size info wrong, for one. It also means that tests that do want to adapt to cluster memory, but don't make sense to run on a variety of parameterizations (maybe they're meant to test spilling, or they know they won't work on larger datasets [see comment below]), are difficult to write.
Instead, why not have memory_factor
be the fixture, and multiply within the test when appropriate? A little more code, but also more explicit and readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My only question here is whether it makes sense to use the same data sizes for all of the array tests.
Was there a specific reason the designated multiples were chosen for each test. I think @gjoseph92 may have better context. See #410 as alternative.
Absolutely. These multiples are are based on actual workloads reported by actual users relative to the size of their actual cluster. As an example, So adding a multiple to every test's data size might kinda make sense, but we shouldn't just change how all the data sizes are already set up relative to cluster memory—just apply the multiple at the end. |
f4e2bad
to
56fe27f
Compare
This will run the array workloads that accept cluster memory as an input parameter parametrized for a low-memory and high-memory situation.
The tests themselves sometimes still multiply this value but I didn't want to modify test logic. Further down the road, it will surely be an interesting exercise to increase this value to >1.0 as well but for now we're mostly concerned getting the low mem cases right.
For future benchmark development I believe the pattern of adjusting used data size as a function of cluster memory is a good practice that allows us scaling these tests very easily