Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

specifying a partition size breaks down with larger datasets #40

Open
phobson opened this issue Dec 1, 2022 · 0 comments
Open

specifying a partition size breaks down with larger datasets #40

phobson opened this issue Dec 1, 2022 · 0 comments
Assignees

Comments

@phobson
Copy link
Contributor

phobson commented Dec 1, 2022

In #39, one of the tests I added uses a 12-month dataset instead of a 1-month dataset. When we try to fetch the data in 2 MiB partition, we're generally successful in with the smaller dataset. But things are consistently wrong in both directions with the larger dataset.

Copy/pasted from here:

N.B. -- the check that we perform is comparing actual partition sizes to 2x the requested partition size.

(Pdb) from dask.utils import format_bytes
(Pdb) partition_sizes.map(format_bytes).to_frame("result").assign(expected="2 MiB")
        result expected
0     1.60 MiB    2 MiB
1     1.71 MiB    2 MiB
2     2.18 MiB    2 MiB
3     3.51 MiB    2 MiB
4     1.60 MiB    2 MiB
5     1.71 MiB    2 MiB
6     2.18 MiB    2 MiB
7     4.36 MiB    2 MiB
8     1.39 MiB    2 MiB
9   875.77 kiB    2 MiB
10    1.71 MiB    2 MiB
11    2.18 MiB    2 MiB
12    3.72 MiB    2 MiB
13    1.60 MiB    2 MiB
14    1.70 MiB    2 MiB
15    2.18 MiB    2 MiB
16    1.69 MiB    2 MiB
17    1.28 MiB    2 MiB
18    1.70 MiB    2 MiB
19    2.18 MiB    2 MiB
20    4.37 MiB    2 MiB
21    1.82 MiB    2 MiB
22    1.70 MiB    2 MiB
23    2.18 MiB    2 MiB
24    3.30 MiB    2 MiB
25    1.60 MiB    2 MiB
26    1.71 MiB    2 MiB
27    2.18 MiB    2 MiB
28    4.37 MiB    2 MiB
29    2.79 MiB    2 MiB
30    1.60 MiB    2 MiB
31    1.71 MiB    2 MiB
32    2.18 MiB    2 MiB
33    4.37 MiB    2 MiB
34    1.29 MiB    2 MiB

I'll start a PR to investigate this further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant