-
Notifications
You must be signed in to change notification settings - Fork 65
Issues: Lightning-AI/litData
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Support for Pathlib in litdata.map
enhancement
New feature or request
#581
opened May 5, 2025 by
SkafteNicki
There is still a risk of race condition when indexing an s3 bucket with multi node (Parquet datasets)
#578
opened May 2, 2025 by
bhimrazy
litdata.optimize()
function returns without raising error, prior to all processes finishing work
bug
#575
opened Apr 29, 2025 by
JacobARose
Add example for hugging face dataset optimization and benchmarks reports
documentation
Improvements or additions to documentation
#558
opened Apr 17, 2025 by
bhimrazy
Refactor Data Processor to Use a Shared Queue Across Workers
enhancement
New feature or request
#556
opened Apr 16, 2025 by
bhimrazy
How to correctly mix multiple
StreamingDataset
to create new data?
#554
opened Apr 13, 2025 by
philgzl
OutOfBoundsError
when streaming parquet files with low_memory=True
bug
#553
opened Apr 13, 2025 by
kyoungrok0517
Duplicate UserWarning Logs for Something isn't working
help wanted
Extra attention is needed
lightning-sdk
Version Check
bug
#527
opened Mar 24, 2025 by
deependujha
Cycle option for StreamingDataLoader
enhancement
New feature or request
waiting on author
Waiting for user input or feedback.
#524
opened Mar 24, 2025 by
Aceticia
Local cache dir not fully clearing in DDP multi-node training.
bug
Something isn't working
help wanted
Extra attention is needed
#512
opened Mar 12, 2025 by
JackUrb
Add batch sampler for StreamDataloader to enable flexiable training strategy.
enhancement
New feature or request
#509
opened Mar 11, 2025 by
xinsir6
How to optimimize dataset for pretraining from HuggingFace
bug
Something isn't working
question
Further information is requested
#482
opened Feb 21, 2025 by
TheLukaDragar
Add pytest fixture to limit max time a test can take
bug
Something isn't working
help wanted
Extra attention is needed
#475
opened Feb 17, 2025 by
deependujha
CI error: Something isn't working
help wanted
Extra attention is needed
won't fix
All chunks should've been deleted
keeps coming back
bug
#437
opened Dec 20, 2024 by
deependujha
Restart training with new data, mid-epoch
enhancement
New feature or request
won't fix
#436
opened Dec 17, 2024 by
schopra8
Question: Is there a list for publicly available s3 links of datasets of Further information is requested
won't fix
litdata.StreamingDataset
format?
question
#430
opened Dec 2, 2024 by
2catycm
Clear Examples of use with different dataset types and code changes.
enhancement
New feature or request
won't fix
#409
opened Nov 4, 2024 by
Woodr7
incorrect dataloader length when Something isn't working
help wanted
Extra attention is needed
won't fix
drop_last=False
bug
#402
opened Oct 28, 2024 by
grez72
Improve CombinedStreamingDataset to handle multiple subdatasets efficiently
enhancement
New feature or request
#386
opened Oct 2, 2024 by
bhimrazy
The config isn't consistent between chunks
bug
Something isn't working
help wanted
Extra attention is needed
waiting on author
Waiting for user input or feedback.
#370
opened Sep 17, 2024 by
gluonfield
How can I shut down automatically distributing data when using StreamingDataset?
enhancement
New feature or request
question
Further information is requested
won't fix
#368
opened Sep 12, 2024 by
ygtxr1997
StreamingDataset causes NCCL timeout when using multiple nodes
bug
Something isn't working
help wanted
Extra attention is needed
#340
opened Aug 26, 2024 by
hubenjm
StreamingDataset intermittently fails due to lack of index.json
bug
Something isn't working
help wanted
Extra attention is needed
won't fix
#337
opened Aug 20, 2024 by
plra
Use different batch sizes in CombinedStreamingDataset
enhancement
New feature or request
help wanted
Extra attention is needed
won't fix
#327
opened Aug 10, 2024 by
schopra8
Add support for multi sample item in optimize and yielding from the _getitem_ of the StreamingDataset
enhancement
New feature or request
help wanted
Extra attention is needed
won't fix
#317
opened Aug 8, 2024 by
tchaton
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.