-
-
Notifications
You must be signed in to change notification settings - Fork 726
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SpillBuffer.spilled_total
appears to return incorrect results
#6783
Comments
A couple of comments here, it looks a bit weird to me that the disk size calculated as distributed/distributed/spill.py Lines 30 to 31 in 55cc1a5
|
The size of spill_size feels fine to me. Just looking at the way we serialize things, the overhead for this array appears to be 232 bytes ( |
It looks like the calculation of the memory size is wrong: distributed/distributed/spill.py Line 290 in 55cc1a5
uses len but we should probably use nbytes on memoryview objects (https://docs.python.org/3/library/stdtypes.html#memoryview.nbytes). I'll file a PR for this.
|
That line was suggested by Guido when we were working on this. I'm sure he'll have an idea of what could be happening here. I know he is on PTO, but he might be a good person to review this. He will be back next week. |
I was not aware that |
When spilling data to disk, the
SpillBuffer
appears to return the incorrect size of the data on disk. For example, when spilling an ~8 MB random matrix onto disk, thedata
file created by theSpillBuffer
is also ~8 MB in size, yet the SpillBuffer only returns ~1 MB.Reproducer:
fails with
The text was updated successfully, but these errors were encountered: