-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integration tests: spill/unspill #136
Comments
This was mostly done by Naty and Hendrik. Thinks were fine.
…On Wed, May 25, 2022 at 4:26 AM Florian Jetter ***@***.***> wrote:
Persist a lot of data on disk
What happens when we load 10x more data than we have RAM?
- Does Coiled have enough storage by default? How do we modify this?
- How are we doing against theoretical performance? Is Dask's
spill-to-disk efficient?
Pseudocode
x = da.random.random(...).persist() # load the data wait(x)x.sum().compute() # force us to read the data again
—
Reply to this email directly, view it on GitHub
<#136>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTALVTHGIYJHKFEZMOLVLXW2RANCNFSM5W4OUKYQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
--
<https://coiled.io>
Matthew Rocklin CEO
|
*Things were fine
…On Wed, May 25, 2022 at 8:38 AM Matthew Rocklin ***@***.***> wrote:
This was mostly done by Naty and Hendrik. Thinks were fine.
On Wed, May 25, 2022 at 4:26 AM Florian Jetter ***@***.***>
wrote:
> Persist a lot of data on disk
>
> What happens when we load 10x more data than we have RAM?
>
> - Does Coiled have enough storage by default? How do we modify this?
> - How are we doing against theoretical performance? Is Dask's
> spill-to-disk efficient?
>
> Pseudocode
>
> x = da.random.random(...).persist() # load the data wait(x)x.sum().compute() # force us to read the data again
>
> —
> Reply to this email directly, view it on GitHub
> <#136>, or unsubscribe
> <https://github.com/notifications/unsubscribe-auth/AACKZTALVTHGIYJHKFEZMOLVLXW2RANCNFSM5W4OUKYQ>
> .
> You are receiving this because you are subscribed to this thread.Message
> ID: ***@***.***>
>
--
<https://coiled.io>
Matthew Rocklin CEO
--
<https://coiled.io>
Matthew Rocklin CEO
|
Currently the default is 100GiB EBS, but if there's a local NVMe we'll attach this to There isn't currently a way to adjust size of EBS, I'd be interested to know if there's desire/need for that. This means that currently best way to get large disk (and fast disk for disk-intensive workloads) is using instance with NVMe—for example, something from |
@ncclementi where did this work end up? |
We tried to persist a lot of data with @hendrikmakait and we were able to do so, things didn't crash, and in the process we discover something that lead to this dask/distributed#6280 We experimented with this, but we did not write a test. We were not quite sure what the test would look like, as the EBS kept expanding. |
As in, the amount written to disk was expanding? Or disk size was expanding? (That would make me very puzzled.) |
@ntabris I think I made a bad choice of words, we just saw that it kept spilling and it seemed to never end but I don't recall how close did we get to 100GB, @hendrikmakait Do you remember? |
I would actually like to have a test as part of the coiled runtime to not only confirm that this is not an issue right now but it isn't ever becoming an issue.
Is the code that led to this still available? Just because it didn't crash doesn't mean it isn't valuable |
I think this sounds reasonable, the part I am struggling with is how to design a test for this. What is the expected, and what is an issue? At the moment we did something like
I do not have the exact code that we run at the moment, we were experimenting on an ipython session, but Guido was able to reproduce this, and created a test for it, that is on the PR. https://github.com/dask/distributed/pull/6280/files#diff-96777781dd54f26ed9441afb42909cf6f5393d6ef0b2b2a2e7e8dc46f074df93 |
Yup. I think that that would work fine. You would probably persist, wait,
and then call sum() or something else that pulled the data out of memory.
That would be a good thing to test and time
…On Tue, May 31, 2022 at 12:24 PM Naty Clementi ***@***.***> wrote:
I would actually like to have a test as part of the coiled runtime to not
only confirm that this is not an issue right now but it isn't ever becoming
an issue.
I think this sounds reasonable, the part I am struggling with is how to
design a test for this. What is the expected, and what is an issue? At the
moment we did something like
da.random.random((1_000_000, 1_00000), chunks=(10000, 1000)) which is an
array of approximately 750GB and try to persist it on a default cluster.
But there was not a proper test design around it.
Is the code that led to this still available? Just because it didn't crash
doesn't mean it isn't valuable
I do not have the exact code that we run at the moment, we were
experimenting on an ipython session, but Guido was able to reproduce this,
and created a test for it, that is on the PR.
https://github.com/dask/distributed/pull/6280/files#diff-96777781dd54f26ed9441afb42909cf6f5393d6ef0b2b2a2e7e8dc46f074df93
—
Reply to this email directly, view it on GitHub
<#136 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACKZTAFIOGSO32KCLFXKO3VMZDNTANCNFSM5W4OUKYQ>
.
You are receiving this because you commented.Message ID:
***@***.***>
--
<https://coiled.io>
Matthew Rocklin CEO
|
While working on this, dask/distributed#6783 was identified, which makes it hard to gauge disk space usage on workers. |
XREF: dask/distributed/pull/6835 makes it easier to evaluate disk I/O |
It depends as we will see below.
Since https://docs.coiled.io/user_guide/cloud_changelog.html#june-2022, the answer to this question is mostly NO:
For example, the default
For the user, this means that the task eventually fails enough times and they receive a bunch of the following messages:
Note: This is not helpful at all and
We can use
I'd like dask/distributed#6835 merged first before diving into this since we cannot copy values of the |
In terms of hard numbers, we reach ~125 MiB on both exclusive One caveat I found is that if we keep referencing the large array and then calculate a sum on it, we see ~62 MiB of read and write at the same time. It looks like we are unspilling a chunk to calculate a sum on it, then spilling it back to disk because we need that memory for another chunk. Given the immutability of the chunks, we may want to consider a lazier policy that keeps the spilled data on disk until it should remove it, i.e. it wants to get rid of them both in memory and on disk. |
After taking the first shot at this, it looks like scaling
|
To clarify: I mean that we do not have enough storage by default to store 10x more data than we have in RAM, which is the initial question of this issue. The discussion around default disk sizes can be found here: https://github.com/coiled/oss-engineering/issues/123. Following the discussion on that issue and given how easy it is to adjust disk size with |
That's fine. I think any multiplier >1 is fine assuming we can configure this on coiled side. The idea of this issue is to apply a workload that requires more memory than there is available on the cluster but can finish successfully if data is stored to disk. Whether this is 1.5x, 2x or 10x is not that important
Also an interesting find. If chunks are small enough this should always be able to finish. There is an edge case where disk is full, memory is full and the entire cluster pauses. Beyond this edge case, the computation should always finish and we should definitely never see an OOM exception, iff the chunks are small enough |
@fjetter: I'm currently taking a look at what's going on in the |
This was caused by a version mismatch with my custom software environment that used the latest
One reason why it takes a lot of time is that workers keep straggling when reaching the disk size limit and we wait a while before deciding to re-assign those tasks to other workers (and removing said worker). |
One thing that might be helpful for the failing case is monitoring of disk usage for each worker. We currently monitor how much data is spilled, but we do not track/display |
Persist a lot of data on disk
What happens when we load 10x more data than we have RAM?
Pseudocode
[EDIT by @crusaderky ]
In addition to the trivial use case above, we would also like to have
https://github.com/dask/distributed/blob/f7f650154fea29978906c65dd0225415da56ed11/distributed/tests/test_active_memory_manager.py#L1079-L1085
scaled up to production size. This will stress the important use case of spilled tasks that are taken out of the spill file and back into memory not to be computed, but to the transferred to another worker.
This stress test should find a sizing that is spilling/unspilling heavily but is still completing successfully.
Related:
The text was updated successfully, but these errors were encountered: