Initial sif transport implementation #1438

mtrmac · 2022-01-10T16:56:12Z

This is a follow-up to #1402 and #1436 (many thanks to both @yhcote and @tri-adam), self-addressing my review from #1402.

Major changes:

don't create a compressed image
clean up various temporary files/directories ASAP to decrease disk space usage
eliminate a single-use object created by newImageSource, instead just use a function. I think that’s conceptually cleaner but it might well not be worth it in practice (accepting and returning several unnamed values), I’d be happy to change this back again.

Marking as draft because I this is a 109-commit version, primarily to allow easy fixes/updates if I broke anything during those changes (which did happen during development, several times), but we probably don’t want that in the history. In the end, because this is entirely new code to the project, this can be the contributed commits, minimal updates to make the code build, and maybe a single large commit to update.

Only minimally tested, per #1402 (comment) .

vrothberg · 2022-01-11T12:34:34Z

sif/load.go

+// createTarFromSIFInputs creates a tar file at tarPath, using a squashfs image at squashFSPath.
+// It can also use extractedRootPath and scriptPath, which are allocated for its exclusive use,
+// if necessary.
+func createTarFromSIFInputs(ctx context.Context, tarPath, squashFSPath string, injectedScript []byte, extractedRootPath, scriptPath string) error {


This function can take a lot of time. What I would love to have here is a progress-bar spinner to indicate that we're doing some very heavy lifting.

Unfinished thought: I wonder if we could add an io.Writer to types.SystemContext similar as in copy.Options.

Yeah, a progress indication would definitely be useful. It’s not reasonably possible in the current API (and several other transports that e.g. make temporary files on disk would benefit). We now have internal/types, so it’s something that can be built (at least for c/image/copy, where we don’t have to worry whether to expose MBP as a public API commitment).

Just an io.Writer would not be great, this needs to account for interactive use (frequent progress updates), completely non-interactive use (no progress updates, just a log of successes), and (per a RFE) a middle ground of plain-text updates a few times a minute. We might need to build an internal progress abstraction, and I don’t think we can commit to a public io.Writer API in types.SystemContext, just like that, at this point.

Given the Podman 4.0 timing, would it be sufficient to just wrap that operation in a pair of debug logs?

vrothberg

Really great work! In addition to the progress-bar comment, I would appreciate a commit adding debug messages.

mtrmac · 2022-01-11T17:55:40Z

I have added debug messages around the long-running operations.

We could also log size+offset of the found descriptors to get a bit more tracing, OTOH that can be read using siftool at any time.

vrothberg

LGTM

vrothberg · 2022-01-12T09:38:40Z

@mtrmac feel free to merge at will.

Signed-off-by: Yannick Cote <ycote@redhat.com>

Bring sif code in the repo instead of pulling it in at build time. Resolves PR code review discussion. Signed-off-by: Yannick Cote <ycote@redhat.com>

Signed-off-by: Yannick Cote <ycote@redhat.com>

Signed-off-by: Adam Hughes <stickmanica@gmail.com>

Undo the version downgrades in commit "sif: use upstream sif module". Signed-off-by: Miloslav Trmač <mitr@redhat.com>

> gofmt -s -w . fixes build. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

- Rename sif/sif_* to remove the sif_ prefix The directory name is sufficient disambiguation, so decrease the visual noise and repetition. - Remove the sif/internal subpackage We don't need an extra subpackage, just move the code to the only user. This makes the internal code publicly-visible, we'll change that immediately. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

... after moving it from a sif/internal subpackage, make it private again. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

It probabaly doesn't _work_ right now, at least macOS is missing a working fakeroot. We do intend to avoid the use of fakeroot eventually. (Adventurous experimenting developers might provide a no-op "fakeroot" script on most platforms.) Also, having the code compile on macOS significantly helps development. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Fix policy configuration identities in sif - Actually allow something in ValidatePolicyConfigurationScope ; SIF is one of the cases where it's actually a bit plausible that a policy rejecting some filesystem sources might be desirable. - Fix PolicyConfigurationNamespaces not to include the file name itself, and "/" Add tests for sifTransport and sifReference The NewImage and NewImageSource tests are rather pointless, but we don't want to require and invoke fakeroot etc. on every unit test run, at least for now. Use ref.file instead of ref.resolvedFile in newImageSource Consistently with the dir: design, if the user specifies a relative path, use it directly so that we don't introduce races against changes to the directory structure. Beautify sif_transport.go - Use the usual order (transport implementation, followed by reference implementation) - Copy&paste more of the comments, to reinforce the contract requirements. Fix the package name directive Reorganize imports ... to follow the usual convention. Don't use pkg/errors in sif. Mostly replace its uses with fmt.Errorf(...%w...). Fix uses of fmt.Errorf - Use %w instead of %v for error wrapping - Use errors.New when the string is constant Don't prefix wrapped error context with "error " ... to match most of c/image code, where we have previously removed such prefixes. Return true from HasThreadSafeGetBlob ... because that's the case for the current implementation, although it makes no difference for the current c/image/copy caller, when this source only provides one layer. Rename sifImageSource.blobID to blobDigest Rename sifImageSource.configID to configDigest Remove workdir if newImageSource fails Rename sifImageSource.blob* to layer* ... to differentiate the layer data from the config, which is also a "blob" in the ImageSource naming. Remove sifImageSource.diffID The value only needs to be known inside newImageSource, so pass it around in a variable/return value. Remove unused sifImageSource.diffSize ... which allows us to make getLayerInfo a function without a (partially-created) sifImageSource parent object. Move layerTime computation from createBlob to getBlobInfo If anything, the latter is a bit more accurate (capturing the time of the last update of the file we are creating, vs. the time of the initial creation), but we want to eventually that with a value from the SIF header anyway. Remove sifImageSource.layerTime It's only necessary in newImageSource, so use a return value and a local variable for that. Remove unused sifImageSource.layerType Remove sifImageSource.configSize This value is already stored in the sifImageSource.config slice, so don't store a redundant copy. Rename workdir to workDir following usual Go patterns. Rename tarpath to tarPath everywhere Return layerDigest and layerSize from getBlobInfo ... instead of writing it to partially-initialized sifImageSource. Provide a path to getBlobInfo instead of reading it from sifImageSource This makes getBlobInfo independent of the partially-created sifImageSource. Don't create a compressed layer from the SIF file Compression is very costly, principially in CPU time. Many use cases (notably import to c/storage) would only end up decompressing the data again. Those that do neeed the data compressed, like push to a registry, can use the copy pipeline's streaming compression implementation, often without needing to store the compressed version in a temporary file. So this is likely to improve both CPU time usage and (maximum) disk space usage - at the very least against the current implementation which doens't even remove the uncompressed version after creating the compressed one :) This is a minimal version of the change, we are now computing the layer's digest twice. We'll fix that soon. Rename tarPath to layerPath ... to be consistent with the other variables. Don't compute the DiffID separately It's the same value as the layer digest, now that the layer is just the uncompressed tarball. Rename fgz to f The file is now expected to not be compressed. Rename blobDigester to digester ... just to be a bit shorter. Inline some single-use variables when building the manifest Use a struct initializer instead of a set of assignments for config Inline single-use variables when building a config Also remove some fairly redundant comments. Use a switch if sifImageSource.GetBlob ... to make the structure a tiny bit less repetitive. Close the SIF image object in newImageSource Nothing actually needs it afterwards. Rename UnloadSIFImage() to Close() to indirectly silence a linter about handling the error; it can't fail in practice, and isn't quite worth handling. Explicitly specify a MediaType field in the generated OCI manifest ... to follow best practices vs. schema confusion attacks (although this generated manfiest is clearly not a schema confusion attack). Beautify loadedSifImage.Close Turn loadedSifImage.GetConfig into CommandLine - Make loadedSifImage independent of the OCI format details. - Make it clear at the call site that only the command is actually provided. - Don't return an error value which is always nil, which makes the caller simpler. Remove lookup of the sif.DataEnvVar descriptor It is unused in this codebase, and it's unclear what, if anything, it is used for anywhere else. Make SifImage.parseEnvironment and SifImage.parseRunscript stand-alone ... i.e. independent from SifImage, so that we can more easily unit-test it. The res *[]string parameter is rather ugly, but we'll refactor it away soon enough. Should not change behavior. Split parseDefFile from SifImage.generateConfig ... to have an easily unit-testable bit of code. Should not change behavior. This destructively assigns to image.envlist and image.cmdlist instead of appending, but it should be the only writer at that point. Add a smoke test for parseDefFile Use a state machine for parseDefFile ... instead of a nesting scanner.Scan() loops and a goto. Should not change behavior. Remove a misleading comment Now, with GetDescriptor, more than one matching descriptor results in an error, so there isn't anything to assume about a single value. Move DataDeffile descriptor lookup into generateConfig It's the only user of that data Remove deffile and defReader from loadedSifImage Turn them into trivial local variables in generateConfig() The code remains a big convoluted, we'll clean that up soon. Simplify generateConfig Eliminate both deffile and defReader. Pass %environment and %runscript to generateRunscript explicitly That will eventually make it easier to unit-test Return the generated script from generateRunscript instad of updating image This makes generateRunscript stand-alone and easy to unit-test. Add a smoke test for generateRunscript It's not much, but better than nothing. Use InjectedScript instead of Runscript for the script we generate ... everywhere, to differentiate that script from the %runscript section contents. Store injectedScript as a []byte instead of bytes.Buffer No need to keep around the intermediate form, and this allows us to change the implementation. Use strings.Join and Sprintf instead of bytes.Buffer in generateInjectedScript Assuming this is not performance-critical, the code is much shorter, and clearly cannot fail (just like the previous version, which is documented to panic rather than return the errors that version unnecessarily handled). Note that this might change behavior for empty %environment or %runscript sections: we now add extra empty lines. That shouldn't make a difference. Remove the unnecessary error return value from generateInjectedScript Remove loadedSifImage.envlist All users are local to generateConfig. Don't use loadedSifImage.cmdlist for storing %runscript It's just a local value to generateConfig, and we no longer use cmdlist for both %runscript and the final command line. Replace loadedSifImage.cmdlist with a single command string The array only ever has one element, so get rid of the array. Beautify generateConfig Always refer to environment and runscript in the same order. Move generateInjectedScript after parseDefFile First create the data, then consume it. Simplify generateConfig Only have the fallback to "bash" if no script is available in a single place. Rename resultDesc to desc For such a short-lived variable we can have a shorter name. Rename tempdir to tempDir throughout ... to follow Go conventions. Remove a Sync call on the squashfs copy We'd actually prefer that data not to hit the disk; we want to remove it as soon as possible. Instead, scope the deferred Close() so that it happens before we consume the file. Pass around squashFSPath in a variable. ... instead of providing an ambient constant for the relative path. That will make it clearer which code uses that file. Pass around tarPath in a variable. ... instead of providing an ambient constant for the relative path. That will make it clearer which code uses that file. Remove the generated tar file on failure e.g. if creating it runs out of space. Beautify SquashFSToTarLayer Explicitly return values instead of relying on named return values to make the data (in this case, error) flow more explicit. Use Sprintf instead of string concatenation for the generated script It is a bit more manageable that way. Also actually start the script with a recognized shebang instead of a newline. Don't hard-code "squashfs-root" all over the place. Pass extractedRootPath around instead, and use (unsquashfs -d) to override the built-in default. Inline a single-use cmd variable Rename xcmd to cmd ... now that the cmd name is available. Use explicit return statements instead of named return values Make loadedImage always passed by reference The struct contains a stateful *sif.FileImage, which makes no sense to copy; so don't get into that habit even in cases where it might be safe. Use a constant for the /podman/runscript path ... instead of hard-coding it over the place, and even assuming a specific directory structure. Add more context to write failures Don't write to stderr; return error output to the caller Remove the generated script immediately after using it Make the tar file creation cancellable using the provided context Also add TODO notes in other places where we would prefer the copy to be cancellable. Rename runUnSquashFSTar to createTarFromSIFInputs We are going to have it handle the injectedScript as well. Pass scriptPath to exec.Command instead of hard-coding a constant This allows us not to care about the working directory of the script, as well. Move the scriptPath decision to SquashFSToTarLayer That's the only place that is aware of tempDir now. Pass injectedScript to writeInjectedScript ... to make it independent of loadedSifImage; it will go away entirely soon. Call writeInjectedScript from createTarFromSIFInputs ... so that createTarFromSIFInputs is responsible for both creating and consuming extractedRootPath, without any external interference. Move the cleanup of extractedRootPath to createTarFromSIFInputs ... to make it a tiny bit more self-contained, now that it handles the injectedScript as well. Add a comment to more clearly document the alocation of paths in SquashFSToTarLayer Remove loadedSifImage.rootfs Instead, determine the value in SquashFSToTarLayer. This means that we now try to interpret the deffile before checking for rootfs presence, changing the possible order of errors. That shouldn't be much of a difference for valid images. Rename generateConfig to processDefFile Have processDefFile return the values instead of writing to loadedSifImage We will eliminate the loadedSifImage members entirely soon. Pass sif.FileImage to processDefFile explicitly ... instead of using the image.fimg member. Rename SquashFSToTarLayer to convertSIFToElements We are going to have it return other values as well. Return also the command line from convertSIFToElements This turns it into the central point of the conversion process, instead of the fairly ambient loadedSifImage object. Call processDefFile only in convertSIFToElements This allows us to remove loadedSifImage.injectedScript and loadedSifImage.command, making loadedSifImage finally a trivial wrapper around sif.FileImage - and we'll eliminate that wrapper next. Inline loadedSifImage.GetSIFID We don't really need that abstraction. Inline loadedSifImage.GetSIFArch We don't really need that abstraction. Make convertSIFToElements a stand-alone function Eliminating the last non-trivial user of loadedSifImage. Eliminate loadedSifImage Finally, eliminate the loadedSifImage type entirely. It doesn't really make sense to inject a layer of abstraction between sifImageSource and sif.FileImage, purely for the abstraction. loadedSifImage was only ever used in one way, as an essentially procedural step; that is now served by the convertSIFToElements function, rather than being split between the loadedSifImage constructor and the original tarball creation method. (convertSIFToElements might eventually return a struct with named fields if there were many, but it doesn't make sense for newImageSource to hold an object and fill it up one step at a time.) Use the last modification time from the SIF header for OCI creation time Add a TODO note Add a note about (unsquashfs -o) Add debug logs around long-running operations ... and make sure to include paths of the relevant files. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac · 2022-01-12T23:02:23Z

I have left the original commits, some clearly stand-alone things (build fixes, allowing non-Linux builds), file moves, and squashed the rest into one commit.

I‘ll keep the equivalent https://github.com/mtrmac/image/tree/sif-many-commits around for a bit (a few weeks?), in case we needed to revisit any part.

vrothberg

LGTM

This was referenced Jan 10, 2022

sif: initial sif transport implementation #1436

Closed

sif: initial sif transport implementation #1402

Closed

mtrmac force-pushed the sif branch from 4f38711 to d06b693 Compare January 10, 2022 17:02

vrothberg reviewed Jan 11, 2022

View reviewed changes

mtrmac mentioned this pull request Jan 11, 2022

WIP RFC: internal/external interface separation/adapters #1439

Closed

vrothberg approved these changes Jan 12, 2022

View reviewed changes

yhcote and others added 7 commits January 12, 2022 23:29

sif: initial sif transport implementation

13f7888

Signed-off-by: Yannick Cote <ycote@redhat.com>

sif: bring code in

9b33dd1

Bring sif code in the repo instead of pulling it in at build time. Resolves PR code review discussion. Signed-off-by: Yannick Cote <ycote@redhat.com>

sif: limit platform to linux

1757663

Signed-off-by: Yannick Cote <ycote@redhat.com>

sif: satisfy linter

a7517b4

Signed-off-by: Yannick Cote <ycote@redhat.com>

sif: use upstream sif module

8f3f546

Signed-off-by: Adam Hughes <stickmanica@gmail.com>

Re-update golang.org/x/crypto

44ef87d

Undo the version downgrades in commit "sif: use upstream sif module". Signed-off-by: Miloslav Trmač <mitr@redhat.com>

Update build directives

67c18a6

> gofmt -s -w . fixes build. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac force-pushed the sif branch from 4920a80 to 5f563b4 Compare January 12, 2022 22:29

mtrmac added 4 commits January 12, 2022 23:50

Make the sif/load.go code package-private

eb8254b

... after moving it from a sif/internal subpackage, make it private again. Signed-off-by: Miloslav Trmač <mitr@redhat.com>

mtrmac force-pushed the sif branch from 5f563b4 to a281e36 Compare January 12, 2022 23:00

mtrmac marked this pull request as ready for review January 12, 2022 23:02

vrothberg reviewed Jan 13, 2022

View reviewed changes

vrothberg merged commit 23bee35 into containers:main Jan 13, 2022

mtrmac deleted the sif branch January 13, 2022 20:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial sif transport implementation #1438

Initial sif transport implementation #1438

mtrmac commented Jan 10, 2022

vrothberg Jan 11, 2022

mtrmac Jan 11, 2022

vrothberg left a comment

mtrmac commented Jan 11, 2022

vrothberg left a comment

vrothberg commented Jan 12, 2022

mtrmac commented Jan 12, 2022

vrothberg left a comment

Initial sif transport implementation #1438

Initial sif transport implementation #1438

Conversation

mtrmac commented Jan 10, 2022

vrothberg Jan 11, 2022

Choose a reason for hiding this comment

mtrmac Jan 11, 2022

Choose a reason for hiding this comment

vrothberg left a comment

Choose a reason for hiding this comment

mtrmac commented Jan 11, 2022

vrothberg left a comment

Choose a reason for hiding this comment

vrothberg commented Jan 12, 2022

mtrmac commented Jan 12, 2022

vrothberg left a comment

Choose a reason for hiding this comment