Skip to content

Conversation

@luciaquirke
Copy link
Collaborator

@luciaquirke luciaquirke commented Sep 22, 2025

Shuffling the batches empirically stabilizes the static index build ETA, which can otherwise creep up over time to >4x the original estimate.



def allocate_batches(doc_lengths: list[int], N: int) -> list[list[int]]:
def allocate_batches(doc_lengths: list[int], N: int, seed: int = 42) -> list[list[int]]:
Copy link
Collaborator Author

@luciaquirke luciaquirke Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we leave setting the seed to the end user? Can't think of any reason why someone would want non-deterministic shuffling here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just always seed it

@norabelrose norabelrose merged commit d89823c into main Sep 22, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants