Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add in-batch coalescing #1959

Closed
Tracked by #1929
westonpace opened this issue Feb 15, 2024 · 2 comments
Closed
Tracked by #1929

Add in-batch coalescing #1959

westonpace opened this issue Feb 15, 2024 · 2 comments
Assignees

Comments

@westonpace
Copy link
Contributor

The I/O scheduler receives batches of IOPS. For example, a take operation, on a value encoded page, will yield a single batch with one IOP per index. We should coalesce these requests. We can sort the IOPS but I think it might be cheaper if schedulers can be expected to generate ordered sequences of IOPS.

The theory of coalescing is basically that filesystems have a "cost per byte" and a "cost per iop". If two requests come in with size S1 and S2 and the # of bytes between the end of S1 and the start of S2 is D bytes then we should coalesce if D < cost_iop / cost_byte.

This means coalescing requires a single configuration parameter which we can call something like coalesce aggressiveness which is cost_iop / cost_byte. We should investigate a reasonable value for this parameter on the various filesystems and use that as a default (I'm not sure we should allow this to be user facing at all, at least not now).

@westonpace
Copy link
Contributor Author

It's possible the above definition is too simple. It may be possible to coalesce "too much". E.g. in Arrow there is both a "coalesce if D < this many bytes" parameter and a "don't coalesce if the combined IOP is greater than Y bytes" parameter.

However, for in-batch coalescing, I don't think this will be a problem. Batches correspond to pages and pages should not be overly large. So even if we end up coalescing the entire batch into a single request it should not be too large.

@westonpace westonpace self-assigned this Feb 19, 2024
raunaks13 added a commit that referenced this issue Jul 29, 2024
Should fix #2629, addresses #1959 
1. Earlier on randomly accessing rows, each request for a row was being
scheduled separately, which increased overhead, especially on large
datasets. This PR coalesces take scheduling when requests are within
`block_size` distance from each other. The block size is determined
based on the system.
2. The binary scheduler was also scheduling decoding of all indices
individually. This updates the binary scheduler so that it schedules all
offsets at once. These are then processed to determine which bytes to
decode like before.
3. A script we can use to compare v1 vs v2 performance is added as
`test_random_access.py`.

Specifically, on the lineitem dataset (same file from the issue above):
- v1 query time: `0.12s`
- v2 query time (before): `2.8s`.
- v2 query time (after (1)): `0.54s`. 
- v2 query time (after (1) and (2)): `0.02s`.
@westonpace
Copy link
Contributor Author

Closed by #2636

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant