Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ExperimentDataPipe should record the iterated joinids #1182

Open
ebezzi opened this issue Jun 4, 2024 · 0 comments
Open

ExperimentDataPipe should record the iterated joinids #1182

ebezzi opened this issue Jun 4, 2024 · 0 comments
Labels
P1 Priority 1 - Improvement with wide impact, fix within 1 week pytorch tileDB work

Comments

@ebezzi
Copy link
Member

ebezzi commented Jun 4, 2024

When ExperimentDataPipe iterates through cells, the obs_joinids are not recorded anywhere. This isn't important for training but it's necessary if the same datapipe is used for a forward pass (e.g. when generating embeddings). The current _obs_joinids field can be used but:

  1. It requires shuffling to be off.
  2. Doesn't work with multiple workers, since they don't process in order.
@ebezzi ebezzi added the tech label Jun 4, 2024
@pablo-gar pablo-gar added P1 Priority 1 - Improvement with wide impact, fix within 1 week pytorch and removed tech labels Jun 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 Priority 1 - Improvement with wide impact, fix within 1 week pytorch tileDB work
Projects
None yet
Development

No branches or pull requests

3 participants