Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parallelizing file including #9

Open
lun-4 opened this issue Apr 3, 2022 · 1 comment
Open

parallelizing file including #9

lun-4 opened this issue Apr 3, 2022 · 1 comment

Comments

@lun-4
Copy link
Owner

lun-4 commented Apr 3, 2022

we can quickly get the list of files in a folder, but at the moment we just take the entire list sequentially, which leads to inefficiency as we are always blocked on I/O, we can't pin a core to 100% usage lol.

  • is it possible to bump up the chunk size (at the moment it's 1KB) and get speedier includes while maintaining sequential flow?
    • NOTE: make sure to wipe filesystem caches while testing: echo 3 | sudo tee /proc/sys/vm/drop_caches
  • move away from sequential flow into a job queue flow
    • get worker threads that take files and hash them up
      • is it possible to split the hashing work even more thanks to blake3's ability to be parallelized?
    • as files get hashed up, submit them for tag processing
      • tag inferrers can declare if they're thread safe or not, and if not, just have a single thread that works on the tag processing queue in a sequential manner.
    • hope it gives us high octane speedy includes
@haze
Copy link

haze commented Apr 3, 2022

edit: ok, use a thread pool for hashing and io_uring for reading & submitting file contents (I was told this was similar to what lithdew does in rheia)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants