You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
Currently we end up reading in a single file many different times for different tasks. This slows down performance considerable, scaling exponentially with file size.
Describe the solution you'd like
We rework the code base to pass around a single in memory file object (e.g. bytesIO).
Describe alternatives you've considered
There are other potential means to increase performance (threading, multiprocessing, async, etc.)
The text was updated successfully, but these errors were encountered:
One thing that could be a bit tricky with this is 3rd party libraries needing changes to support taking a memory file object. One thing I know is pretty slow right now is file hashing where it reads the same file multiple times (md5+sha1+sha256) -- reworking https://github.com/LLNL/Surfactant/blob/main/surfactant/cmd/cli.py#L171-L189 to only read in the bytes once could yield some easy performance improvements.
Is your feature request related to a problem? Please describe.
Currently we end up reading in a single file many different times for different tasks. This slows down performance considerable, scaling exponentially with file size.
Describe the solution you'd like
We rework the code base to pass around a single in memory file object (e.g. bytesIO).
Describe alternatives you've considered
There are other potential means to increase performance (threading, multiprocessing, async, etc.)
The text was updated successfully, but these errors were encountered: