Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read in files once. #154

Open
mcutshaw opened this issue Mar 6, 2024 · 1 comment
Open

Read in files once. #154

mcutshaw opened this issue Mar 6, 2024 · 1 comment
Labels
enhancement New feature or request

Comments

@mcutshaw
Copy link
Collaborator

mcutshaw commented Mar 6, 2024

Is your feature request related to a problem? Please describe.
Currently we end up reading in a single file many different times for different tasks. This slows down performance considerable, scaling exponentially with file size.

Describe the solution you'd like
We rework the code base to pass around a single in memory file object (e.g. bytesIO).

Describe alternatives you've considered
There are other potential means to increase performance (threading, multiprocessing, async, etc.)

@mcutshaw mcutshaw added the enhancement New feature or request label Mar 6, 2024
@nightlark
Copy link
Collaborator

One thing that could be a bit tricky with this is 3rd party libraries needing changes to support taking a memory file object. One thing I know is pretty slow right now is file hashing where it reads the same file multiple times (md5+sha1+sha256) -- reworking https://github.com/LLNL/Surfactant/blob/main/surfactant/cmd/cli.py#L171-L189 to only read in the bytes once could yield some easy performance improvements.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants