Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updater: target hash calculation should not read the whole file in memory #1215

Closed
jku opened this issue Nov 17, 2020 · 2 comments · Fixed by #1219
Closed

Updater: target hash calculation should not read the whole file in memory #1215

jku opened this issue Nov 17, 2020 · 2 comments · Fixed by #1219

Comments

@jku
Copy link
Member

jku commented Nov 17, 2020

Description of issue or feature request:

This is not a bug (although it could be on memory limited client device), but a performance improvement:

After PR #1202 there is only one place where the updater loads the whole target file in memory. We should avoid doing that as targets could be very large and memory could be limited.

Current behavior:

_check_hashes() does this:

digest_object = securesystemslib.hash.digest(algorithm)
digest_object.update(file_object.read())
computed_hash = digest_object.hexdigest()

Expected behavior:
something like handwaves

digest_object = securesystemslib.hash.digest(algorithm)
while True:
    chunk = file_object.read(CHUNK_SIZE)
    if not chunk:
        break
    digest_object.update(chunk)
computed_hash = digest_object.hexdigest()

or even more simply just let SSLib handle this with its default chunk size:

digest_object = securesystemslib.hash.digest_fileobject(file_object, algorithm)
computed_hash = digest_object.hexdigest()
@trishankatdatadog
Copy link
Member

Good find, thanks! On top of that, shouldn't securesystemslib already offer a function for checking hashes?

@jku
Copy link
Member Author

jku commented Nov 17, 2020

shouldn't securesystemslib already offer a function for checking hashes?

Yes my last example uses a sslib function for calculating a fileobject hash, I just forgot to include 'securesystemslib.hash.' prefix there (now fixed)

Or do you mean actually comparing the hashes?

jku pushed a commit to jku/python-tuf that referenced this issue Nov 22, 2020
We don't want to read the whole file in memory as it can be huge. Use
digest_fileobject() instead: This way Securesystemslib will read the
file in chunks.

Securesystemslib already takes care of seeking to beginning of file.

Fixes theupdateframework#1215

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
jku pushed a commit to jku/python-tuf that referenced this issue Nov 23, 2020
We don't want to read the whole file in memory as it can be huge. Use
digest_fileobject() instead: This way Securesystemslib will read the
file in chunks.

Securesystemslib already takes care of seeking to beginning of file.

Fixes theupdateframework#1215

Signed-off-by: Jussi Kukkonen <jkukkonen@vmware.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants