chunked: do not use a temporary file #2041

giuseppe · 2024-07-24T09:10:04Z

and use directly the stream to create the temporary zstd:chunked file.

openshift-ci · 2024-07-24T09:10:08Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: giuseppe

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [giuseppe]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

and use directly the stream to create the temporary zstd:chunked file. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

TomSweeneyRedHat · 2024-07-24T14:01:34Z

LGTM
and I'd like to get this in before starting the vendor dance. @nalind @mtrmac PTAL

cgwalters

The goal is improved performance, right? Worth stating in the commit message if so.

cgwalters · 2024-07-24T13:57:52Z

pkg/chunked/storage_linux.go

-		blobFile.Close()
-		blobFile = nil
+		// Make sure the entire payload is consumed.
+		_, _ = io.Copy(io.Discard, payload)


Shouldn't we still check for errors here? Maybe it doesn't matter because if e.g. we get a short read or something we'll presumably fail checksum validation anyways.

If that's the idea then I think it at least deserves a comment.

I've not measured performance, this is just a preparatory work for extending ApplyDiff() to use this codepath when we convert images, so we can support pulling from other sources too, not only registries.

I've not yet found a nice way to extend the API though.

Yes (if this goes in) I think it would be better to report a read error (the cause) instead of a digest mismatch that is hard to diagnose.

cgwalters · 2024-07-24T14:03:31Z

pkg/chunked/storage_linux.go

 			}
 		}()

-		// calculate the checksum before accessing the file.


Presumably we were doing this for a reason (security?) before...is it just that we think convertTarToZstdChunked can now safely operate on untrusted input?

hmm I don't remember the security implications, @mtrmac do you think this approach is fine or should I just close this PR?

On balance, I’d prefer this not to be merged as is.

This exposes the conversion code and decompression to malicious input in more situations.

In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.

But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

So I don’t feel that strongly about it here.

More importantly, if this is aiming at the c/storage ApplyDiff = c/image PutBlob (not PutBlobPartial) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)

So, if the goal is code structure, not performance or PutBlobPartial, I don’t see that this makes any difference for the ApplyDiff path, and it is just a performance/risk trade-off for the PutBlobPartial path.

If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.

Except for the one line that allows convertTarToZstdChunked to accept an arbitrary io.Reader, and the missing payload.Close.

mtrmac · 2024-07-24T15:52:03Z

pkg/chunked/storage_linux.go

-	// copy the entire tarball and compute its digest
-	_, err = io.CopyBuffer(destination, r, c.copyBuffer)
+	rc := ioutils.NewReadCloserWrapper(r, func() error {
+		return payload.Close()


We were not closing payload before? That fix should not be forgotten.

mtrmac · 2024-07-24T15:56:24Z

pkg/chunked/storage_linux.go

 			}
 		}()

-		// calculate the checksum before accessing the file.


On balance, I’d prefer this not to be merged as is.

This exposes the conversion code and decompression to malicious input in more situations.

In principle, the conversion code and decompression must be able to handle malicious input anyway, because an attacker could, in many cases, refer to the malicious input in a valid manifest, without triggering this check.

But some users might have a policy which only accepts signed images, i.e. the malicious input would only be digested, not processed otherwise.

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

So I don’t feel that strongly about it here.

More importantly, if this is aiming at the c/storage ApplyDiff = c/image PutBlob (not PutBlobPartial) path, c/image is currently creating a temporary file, and verifying the digest, on that path, without c/storage having any way to prevent it; and c/storage wouldn’t need to do this. (Also, c/image streams the data to the file, and digests it, concurrently for several layers, without holding any storage locks, which seems valuable.)

So, if the goal is code structure, not performance or PutBlobPartial, I don’t see that this makes any difference for the ApplyDiff path, and it is just a performance/risk trade-off for the PutBlobPartial path.

If you don’t actually care about the performance at this point, we get the increased risk but not any benefit we value — so I’d prefer to close the PR and not merge this; we can always resurrect it later.

Except for the one line that allows convertTarToZstdChunked to accept an arbitrary io.Reader, and the missing payload.Close.

mtrmac

See elsewhere for the higher-level evaluation.

cgwalters · 2024-07-24T16:12:38Z

I’m really more worried about the decompression than the chunked conversion. That’s a lot of bit manipulation, potentially in an unsafe language “for performance”, with a fair history of vulnerabilities.

Though now that we're forking a separate process we could isolate those quite aggressively.

mtrmac · 2024-07-24T16:25:58Z

Which forking does that refer to?

cgwalters · 2024-07-24T16:51:19Z

Which forking does that refer to?

#1964

cgwalters · 2024-07-25T14:30:13Z

… and for ordinary image pulls, we currently also stream the input through decompression, only validating the digest concurrently; we don’t validate the digest before starting to decompress. So, in that sense, we are already accepting a larger part of the risk.

Right there's that and also ultimately we still need to be safe against untrusted/malicious zstd:chunked inputs even if the checksum matches.

So I'm going to proactively reopen this PR since I think it makes sense.

TomSweeneyRedHat · 2024-07-25T21:59:57Z

If this is merged on or before August 12, 2024, please cherry-pick this to the release-1.55 branch

The payload stream must be closed after being used. Reported here: containers#2041 (comment) Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

openshift-ci bot added the approved label Jul 24, 2024

chunked: do not use a temporary file

b6ba804

and use directly the stream to create the temporary zstd:chunked file. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe force-pushed the chunked-do-not-use-temporary-file branch from ef8928f to b6ba804 Compare July 24, 2024 09:34

cgwalters reviewed Jul 24, 2024

View reviewed changes

TomSweeneyRedHat added the 5.2 Wanted for Podman v5.2 label Jul 24, 2024

mtrmac reviewed Jul 24, 2024

View reviewed changes

giuseppe closed this Jul 24, 2024

cgwalters reopened this Jul 25, 2024

cgwalters removed the 5.2 Wanted for Podman v5.2 label Jul 30, 2024

giuseppe added a commit to giuseppe/storage that referenced this pull request Oct 24, 2024

chunked: close payload stream

8b5dadb

The payload stream must be closed after being used. Reported here: containers#2041 (comment) Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com>

giuseppe mentioned this pull request Oct 24, 2024

chunked: close payload stream #2151

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chunked: do not use a temporary file #2041

chunked: do not use a temporary file #2041

giuseppe commented Jul 24, 2024

openshift-ci bot commented Jul 24, 2024

TomSweeneyRedHat commented Jul 24, 2024

cgwalters left a comment

cgwalters Jul 24, 2024

giuseppe Jul 24, 2024

mtrmac Jul 24, 2024

cgwalters Jul 24, 2024

giuseppe Jul 24, 2024

mtrmac Jul 24, 2024 •

edited

Loading

mtrmac Jul 24, 2024

mtrmac Jul 24, 2024 •

edited

Loading

mtrmac left a comment

cgwalters commented Jul 24, 2024

mtrmac commented Jul 24, 2024

cgwalters commented Jul 24, 2024

cgwalters commented Jul 25, 2024

TomSweeneyRedHat commented Jul 25, 2024

chunked: do not use a temporary file #2041

Are you sure you want to change the base?

chunked: do not use a temporary file #2041

Conversation

giuseppe commented Jul 24, 2024

openshift-ci bot commented Jul 24, 2024

TomSweeneyRedHat commented Jul 24, 2024

cgwalters left a comment

Choose a reason for hiding this comment

cgwalters Jul 24, 2024

Choose a reason for hiding this comment

giuseppe Jul 24, 2024

Choose a reason for hiding this comment

mtrmac Jul 24, 2024

Choose a reason for hiding this comment

cgwalters Jul 24, 2024

Choose a reason for hiding this comment

giuseppe Jul 24, 2024

Choose a reason for hiding this comment

mtrmac Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

mtrmac Jul 24, 2024

Choose a reason for hiding this comment

mtrmac Jul 24, 2024 • edited Loading

Choose a reason for hiding this comment

mtrmac left a comment

Choose a reason for hiding this comment

cgwalters commented Jul 24, 2024

mtrmac commented Jul 24, 2024

cgwalters commented Jul 24, 2024

cgwalters commented Jul 25, 2024

TomSweeneyRedHat commented Jul 25, 2024

mtrmac Jul 24, 2024 •

edited

Loading

mtrmac Jul 24, 2024 •

edited

Loading