Add in-batch coalescing #1959

westonpace · 2024-02-15T15:56:01Z

The I/O scheduler receives batches of IOPS. For example, a take operation, on a value encoded page, will yield a single batch with one IOP per index. We should coalesce these requests. We can sort the IOPS but I think it might be cheaper if schedulers can be expected to generate ordered sequences of IOPS.

The theory of coalescing is basically that filesystems have a "cost per byte" and a "cost per iop". If two requests come in with size S1 and S2 and the # of bytes between the end of S1 and the start of S2 is D bytes then we should coalesce if D < cost_iop / cost_byte.

This means coalescing requires a single configuration parameter which we can call something like coalesce aggressiveness which is cost_iop / cost_byte. We should investigate a reasonable value for this parameter on the various filesystems and use that as a default (I'm not sure we should allow this to be user facing at all, at least not now).

The text was updated successfully, but these errors were encountered:

westonpace · 2024-02-15T15:59:53Z

It's possible the above definition is too simple. It may be possible to coalesce "too much". E.g. in Arrow there is both a "coalesce if D < this many bytes" parameter and a "don't coalesce if the combined IOP is greater than Y bytes" parameter.

However, for in-batch coalescing, I don't think this will be a problem. Batches correspond to pages and pages should not be overly large. So even if we end up coalescing the entire batch into a single request it should not be too large.

Should fix #2629, addresses #1959 1. Earlier on randomly accessing rows, each request for a row was being scheduled separately, which increased overhead, especially on large datasets. This PR coalesces take scheduling when requests are within `block_size` distance from each other. The block size is determined based on the system. 2. The binary scheduler was also scheduling decoding of all indices individually. This updates the binary scheduler so that it schedules all offsets at once. These are then processed to determine which bytes to decode like before. 3. A script we can use to compare v1 vs v2 performance is added as `test_random_access.py`. Specifically, on the lineitem dataset (same file from the issue above): - v1 query time: `0.12s` - v2 query time (before): `2.8s`. - v2 query time (after (1)): `0.54s`. - v2 query time (after (1) and (2)): `0.02s`.

westonpace · 2024-09-11T13:40:15Z

Closed by #2636

westonpace mentioned this issue Feb 15, 2024

Lance File Format Version 2 (technically v0.3) #1929

Open

30 tasks

westonpace mentioned this issue Feb 15, 2024

Add out-of-batch coalescing #1960

Open

westonpace self-assigned this Feb 19, 2024

raunaks13 mentioned this issue Jul 24, 2024

feat: coalesce scheduling of reads to speed up random access #2636

Merged

westonpace closed this as completed Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add in-batch coalescing #1959

Add in-batch coalescing #1959

westonpace commented Feb 15, 2024

westonpace commented Feb 15, 2024

westonpace commented Sep 11, 2024

Add in-batch coalescing #1959

Add in-batch coalescing #1959

Comments

westonpace commented Feb 15, 2024

westonpace commented Feb 15, 2024

westonpace commented Sep 11, 2024