Skip to content

Conversation

@chebbyChefNEQ
Copy link
Contributor

@chebbyChefNEQ chebbyChefNEQ commented Apr 11, 2025

we previously disabled async dataset for index training because it required setting global multiprocess context to spawn

This PR makes the async dataset class have an internal mp context what is always spawn. Also enable async dataloading for vector index training

Async dataset move heavy compute to a subprocess, therefore reducing the time, for which we need to wait when loading batches for the GPU

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

@github-actions github-actions bot added enhancement New feature or request python labels Apr 11, 2025
filter=filt,
)
) as torch_ds:
loader = torch.utils.data.DataLoader(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

every thing below here are just indentation change

@github-actions
Copy link
Contributor

Thank you for your contribution. This PR has been inactive for a while, so we're closing it to free up bandwidth. Feel free to reopen it if you still find it useful.

@github-actions github-actions bot closed this Nov 16, 2025
@Xuanwo Xuanwo deleted the rmeng/async-ds branch December 5, 2025 14:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request python Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants