Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

too many open files from dataloader #379

Open
jpata opened this issue Dec 12, 2024 · 0 comments
Open

too many open files from dataloader #379

jpata opened this issue Dec 12, 2024 · 0 comments
Labels

Comments

@jpata
Copy link
Owner

jpata commented Dec 12, 2024

I'm seeing some issues with tfds, it does not seem to close open files properly after we switched to the split datasets in #350 that perhaps made the problem somewhat more apparent.

The reason is that random access to the concatenated datasets does not allow files to be closed.
With shuffling disabled here: https://github.com/jpata/particleflow/blob/main/mlpf/model/PFDataset.py#L259, the usage seems to be somewhat lower.

@jpata jpata added the hard label Dec 12, 2024
@jpata jpata changed the title migrate from tfds array record datasets to native pytorch parquet datasets too many open files from dataloader Dec 18, 2024
@jpata jpata added bug and removed hard labels Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant