parallelizing file including #9

lun-4 · 2022-04-03T17:35:21Z

we can quickly get the list of files in a folder, but at the moment we just take the entire list sequentially, which leads to inefficiency as we are always blocked on I/O, we can't pin a core to 100% usage lol.

is it possible to bump up the chunk size (at the moment it's 1KB) and get speedier includes while maintaining sequential flow?
- NOTE: make sure to wipe filesystem caches while testing: echo 3 | sudo tee /proc/sys/vm/drop_caches
move away from sequential flow into a job queue flow
- get worker threads that take files and hash them up
  - is it possible to split the hashing work even more thanks to blake3's ability to be parallelized?
- as files get hashed up, submit them for tag processing
  - tag inferrers can declare if they're thread safe or not, and if not, just have a single thread that works on the tag processing queue in a sequential manner.
- hope it gives us high octane speedy includes

The text was updated successfully, but these errors were encountered:

haze · 2022-04-03T17:36:49Z

edit: ok, use a thread pool for hashing and io_uring for reading & submitting file contents (I was told this was similar to what lithdew does in rheia)

lun-4 mentioned this issue Oct 8, 2023

parallelize file hashing on janitor full mode #58

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parallelizing file including #9

parallelizing file including #9

lun-4 commented Apr 3, 2022

haze commented Apr 3, 2022 •

edited

Loading

parallelizing file including #9

parallelizing file including #9

Comments

lun-4 commented Apr 3, 2022

haze commented Apr 3, 2022 • edited Loading

haze commented Apr 3, 2022 •

edited

Loading