-
Notifications
You must be signed in to change notification settings - Fork 201
Issues: modelscope/data-juicer
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
process过程有算子会导致卡死
question
Further information is requested
#560
opened Jan 22, 2025 by
SkyAndFly
3 tasks done
数据分类器有具体的下载链接吗
question
Further information is requested
#558
opened Jan 21, 2025 by
obj12
2 of 3 tasks
How to do sentence_dedup
enhancement
New feature or request
#556
opened Jan 20, 2025 by
ftgreat
1 of 2 tasks
When will version 2.0 be released
question
Further information is requested
#548
opened Jan 14, 2025 by
javapythonphp
3 tasks done
[Bug]: Fail to run ray_bts_minhash_deduplicator
bug
Something isn't working
#547
opened Jan 14, 2025 by
javapythonphp
3 tasks done
Hash configuration information for the dedup performance test of DataJuicer 2.0
question
Further information is requested
#546
opened Jan 14, 2025 by
cist
3 tasks done
[Bug]: ds.JSONDatasource
bug
Something isn't working
#539
opened Jan 10, 2025 by
ariexBear
3 tasks done
Support others LLMs & APIs for the OP issues/PRs about some specific OPs
enhancement
New feature or request
generate_qa_from_text_mapper
dj:op
#535
opened Jan 9, 2025 by
yxdyc
2 tasks done
[BUG]: inappropriate arguments for Something isn't working
dj:dist
issues/PRs about distributed data processing
map_batches
in ray mode
bug
#533
opened Jan 8, 2025 by
HYLcool
Can the cleaning statistics be viewed after creating the config file and performing the cleaning?
question
Further information is requested
#499
opened Nov 27, 2024 by
Tendo33
3 tasks done
Guidance on Monitoring Task Execution with Ray Executor in Data Juicer
dj:dist
issues/PRs about distributed data processing
question
Further information is requested
#496
opened Nov 24, 2024 by
Fatima-0SA
3 tasks done
Anyone tried DJ on multimodal datasets of more than 20M samples?
question
Further information is requested
#482
opened Nov 11, 2024 by
serser
3 tasks done
Update of Jupyter Notebooks
bug
Something isn't working
documentation
Improvements or additions to documentation
#476
opened Nov 6, 2024 by
HYLcool
[Bug]: perplexity_filter 算子内存OOM
bug
Something isn't working
#474
opened Nov 5, 2024 by
weiaicunzai
3 tasks done
[Feat]: Unified LLM Calling Management
enhancement
New feature or request
#451
opened Oct 16, 2024 by
drcege
2 tasks done
[Feat]: Automatic Version Matching During Installation
enhancement
New feature or request
#450
opened Oct 16, 2024 by
drcege
2 tasks done
[Feat]: Enhance Unit Test Coverage for Python and CUDA Compatibility
enhancement
New feature or request
#449
opened Oct 16, 2024 by
drcege
2 tasks done
Require fps filter and mapper for videos
dj:op
issues/PRs about some specific OPs
enhancement
New feature or request
#433
opened Sep 23, 2024 by
BeachWang
[Feat] Support explicit issues/PRs about some specific OPs
enhancement
New feature or request
FusedOP
that allows for the configuration and application of multiple operators in smaller, manageable batches
dj:op
#413
opened Sep 2, 2024 by
yxdyc
2 tasks done
Guidance for OP with multiple data fields to be processed
enhancement
New feature or request
#411
opened Sep 2, 2024 by
yxdyc
2 tasks done
[Feat]: Add Ray actor support
dj:dist
issues/PRs about distributed data processing
enhancement
New feature or request
stale-issue
#371
opened Jul 29, 2024 by
drcege
support panda's student captioner model in our captioning mapper
dj:multimodal
issues/PRs about multimodal data processing
dj:op
issues/PRs about some specific OPs
enhancement
New feature or request
stale-issue
#251
opened Mar 14, 2024 by
yxdyc
ProTip!
Follow long discussions with comments:>50.