Optimize processing small files #1958

lfcnassif · 2023-10-28T20:30:25Z

When indexTempOnSSD = true TempFileTask creates temp files for most files < 1GB size, except for subitems (already in the case data storage) and carved files whose parent already has a temp File. This avoids decompressing the same file multiple times from E(x)01 evidences and also caches data from other image types in network shares.

For small files, we can cache the content on memory, avoiding unneeded writes to and reads from the temp directory for items that can be processed without a temp File.

What file size limit would be reasonable to keep on memory while processing it (after it was taken from the queue)? We already use a large buffer up to 8MB in Item.getBufferedInputStream() method, which would use up to 400MB of memory in a 50 threads machine.

The text was updated successfully, but these errors were encountered:

lfcnassif · 2023-10-28T23:11:36Z

Caching small subitems on memory would also avoid uncompressing them multiple times from the internal case storage, where they are compressed. In the past I also tested creating uncompressed temp files for them, but I didn't come to a conclusion if it improved processing speed or not, since creating temp files has a cost. But keeping them on memory should speed up things a bit.

lfcnassif · 2023-10-29T22:09:38Z

Until now, I didn't get clear differences with this approach, tested with a huge UFDR, one small and one medium size E01. I'll repeat tests using a non SSD temp disk and maybe more evidences...

lfcnassif · 2023-10-31T04:13:02Z

Conclusions after lots of tests on a few evidences (03 E01s and 02 UFDRs):

For E01 processing using a non SSD disk as temp, I got up to 33% speed up using default profile;
For UFDR processing using a non SSD disk as temp, I got up to 50% speed up (with a small WhatsApp database, expanding its messages is a bottleneck for other UFDRs). I think it's due UFDRs are decompressed using a java library while E01s use the faster zlib native library;
For E01 processing using a common SSD disk as temp, I got a minor speed up, up to 10%, when it exists;
For UFDR processing using a common SSD disk as temp, I got up to 12% speed up, when it exists;
For E01 and UFDR processing using a NVME disk as temp, I got no noticeable/conclusive speed up;

A few thoughts:

If user forget to set indexTempOnSSD = true, temp files for compressed files won't be created and this should help a lot;
If users set indexTempOnSSD = true by mistake, keeping small files on heap and writing less to temp disk is better, I tested this with 01 UFDR and processing was 13% faster with the memory cache;
If users forget to disable antivirus on temp disk, I think writing less files to temp should be much faster (not tested);
Using a memory cache for small files results in less writing to the temp SSD, it may make its life a bit longer.

So, I'll merge the proposed change put together with #1224.

Currently the memory buffer limit is 8MB, we may decrease it if someone thinks it is too large, please let me know.

lfcnassif added the enhancement label Oct 28, 2023

lfcnassif self-assigned this Oct 28, 2023

lfcnassif added a commit that referenced this issue Oct 29, 2023

'#1958: new method to cache small item data on memory

affc7ec

lfcnassif added a commit that referenced this issue Oct 29, 2023

'#1958: enable new in memory cache in TempFileTask for most profiles

7295624

lfcnassif linked a pull request Oct 29, 2023 that will close this issue

#1224 Optimization for UFDR and small files #1957

Merged

lfcnassif mentioned this issue Oct 29, 2023

Sometimes negative parse times are shown #1950

Closed

lfcnassif added a commit that referenced this issue Oct 31, 2023

'#1958: better comment

df5950c

lfcnassif mentioned this issue Oct 31, 2023

#1224 Optimization for UFDR and small files #1957

Merged

lfcnassif closed this as completed in #1957 Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize processing small files #1958

Optimize processing small files #1958

lfcnassif commented Oct 28, 2023 •

edited

Loading

lfcnassif commented Oct 28, 2023

lfcnassif commented Oct 29, 2023

lfcnassif commented Oct 31, 2023 •

edited

Loading

Optimize processing small files #1958

Optimize processing small files #1958

Comments

lfcnassif commented Oct 28, 2023 • edited Loading

lfcnassif commented Oct 28, 2023

lfcnassif commented Oct 29, 2023

lfcnassif commented Oct 31, 2023 • edited Loading

lfcnassif commented Oct 28, 2023 •

edited

Loading

lfcnassif commented Oct 31, 2023 •

edited

Loading