Skip to content

Commit

Permalink
Client-side chunks 3: micro-batching (#6440)
Browse files Browse the repository at this point in the history
This is a fork of the old `DataTable` batcher, and works very similarly.

Like before, this batcher will micro-batch using both space and time
thresholds.
There are two main differences:
- This batcher maintains a dataframe per-entity, as opposed to the old
one which worked globally.
- Once a threshold is reached, this batcher further splits the incoming
batch in order to fulfill these invariants:
  ```rust
  /// In particular, a [`Chunk`] cannot:
  /// * contain data for more than one entity path
  /// * contain rows with different sets of timelines
  /// * use more than one datatype for a given component
/// * contain more rows than a pre-configured threshold if one or more
timelines are unsorted
  ```

Most of the code is the same, the real interesting piece is
`PendingRow::many_into_chunks`, as well as the newly added tests.

- Fixes #4431

---

Part of a PR series to implement our new chunk-based data model on the
client-side (SDKs):
- #6437
- #6438
- #6439
- #6440
- #6441
  • Loading branch information
teh-cmc authored May 31, 2024
1 parent b4b7ec4 commit fde4a87
Show file tree
Hide file tree
Showing 3 changed files with 1,611 additions and 1 deletion.
Loading

0 comments on commit fde4a87

Please sign in to comment.