Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE REQUEST] Add multi-threading/processing to the Dataset Validator #52

Open
chavinlo opened this issue Nov 29, 2022 · 5 comments

Comments

@chavinlo
Copy link

It's awfully slow.

I am willing to do it if someone gives me directions

@chinoll
Copy link

chinoll commented Nov 30, 2022

Did you install xformers? huggingface/diffusers#1343

@chavinlo
Copy link
Author

chavinlo commented Dec 1, 2022

Did you install xformers? huggingface/diffusers#1343

yes but I don't think xformers has anything to do with the validation process...

@cafeai
Copy link
Collaborator

cafeai commented Dec 1, 2022

You can skip validation, which is what I would suggest. Internally, I'm personally using a preprocessing application written in Rust. This probably isn't something you want to do in Python.

@lopho
Copy link

lopho commented Dec 5, 2022

small self plug for a preprocessor written in python and fully parallel:
https://github.com/lopho/parallel_dataprocessor

@lopho
Copy link

lopho commented Dec 5, 2022

#60 has parallel validation and migration

This was referenced Dec 5, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants