Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chunked: do not use a temporary file #2041

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

giuseppe
Copy link
Member

and use directly the stream to create the temporary zstd:chunked file.

Copy link
Contributor

openshift-ci bot commented Jul 24, 2024

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

and use directly the stream to create the temporary zstd:chunked file.

Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>
@giuseppe giuseppe force-pushed the chunked-do-not-use-temporary-file branch from ef8928f to b6ba804 Compare July 24, 2024 09:34
@TomSweeneyRedHat
Copy link
Member

LGTM
and I'd like to get this in before starting the vendor dance. @nalind @mtrmac PTAL

Copy link
Contributor

@cgwalters cgwalters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal is improved performance, right? Worth stating in the commit message if so.

blobFile.Close()
blobFile = nil
// Make sure the entire payload is consumed.
_, _ = io.Copy(io.Discard, payload)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still check for errors here? Maybe it doesn't matter because if e.g. we get a short read or something we'll presumably fail checksum validation anyways.

If that's the idea then I think it at least deserves a comment.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not measured performance, this is just a preparatory work for extending ApplyDiff() to use this codepath when we convert images, so we can support pulling from other sources too, not only registries.

I've not yet found a nice way to extend the API though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (if this goes in) I think it would be better to report a read error (the cause) instead of a digest mismatch that is hard to diagnose.

}
}()

// calculate the checksum before accessing the file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Presumably we were doing this for a reason (security?) before...is it just that we think convertTarToZstdChunked can now safely operate on untrusted input?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm I don't remember the security implications, @mtrmac do you think this approach is fine or should I just close this PR?

Copy link
Collaborator

@mtrmac mtrmac Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On balance, I’d prefer this not to be merged as is.


This exposes the conversion code and decompression to malicious input in more situations.

In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.

But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

So I don’t feel that strongly about it here.


More importantly, if this is aiming at the c/storage ApplyDiff = c/image PutBlob (not PutBlobPartial) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)

So, if the goal is code structure, not performance or PutBlobPartial, I don’t see that this makes any difference for the ApplyDiff path, and it is just a performance/risk trade-off for the PutBlobPartial path.

If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.

Except for the one line that allows convertTarToZstdChunked to accept an arbitrary io.Reader, and the missing payload.Close.

@TomSweeneyRedHat TomSweeneyRedHat added the 5.2 Wanted for Podman v5.2 label Jul 24, 2024
// copy the entire tarball and compute its digest
_, err = io.CopyBuffer(destination, r, c.copyBuffer)
rc := ioutils.NewReadCloserWrapper(r, func() error {
return payload.Close()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were not closing payload before? That fix should not be forgotten.

}
}()

// calculate the checksum before accessing the file.
Copy link
Collaborator

@mtrmac mtrmac Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On balance, I’d prefer this not to be merged as is.


This exposes the conversion code and decompression to malicious input in more situations.

In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.

But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

So I don’t feel that strongly about it here.


More importantly, if this is aiming at the c/storage ApplyDiff = c/image PutBlob (not PutBlobPartial) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)

So, if the goal is code structure, not performance or PutBlobPartial, I don’t see that this makes any difference for the ApplyDiff path, and it is just a performance/risk trade-off for the PutBlobPartial path.

If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.

Except for the one line that allows convertTarToZstdChunked to accept an arbitrary io.Reader, and the missing payload.Close.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See elsewhere for the higher-level evaluation.

@cgwalters
Copy link
Contributor

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

Though now that we're forking a separate process we could isolate those quite aggressively.

@mtrmac
Copy link
Collaborator

mtrmac commented Jul 24, 2024

Which forking does that refer to?

@cgwalters
Copy link
Contributor

Which forking does that refer to?

#1964

@giuseppe giuseppe closed this Jul 24, 2024
@cgwalters
Copy link
Contributor

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

Right there's that and also ultimately we still need to be safe against untrusted/malicious zstd:chunked inputs even if the checksum matches.

So I'm going to proactively reopen this PR since I think it makes sense.

@cgwalters cgwalters reopened this Jul 25, 2024
@TomSweeneyRedHat
Copy link
Member

If this is merged on or before August 12, 2024, please cherry-pick this to the release-1.55 branch

@cgwalters cgwalters removed the 5.2 Wanted for Podman v5.2 label Jul 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants