Skip to content

Commit

Permalink
Separate out normalization into configurable passes
Browse files Browse the repository at this point in the history
  • Loading branch information
msuozzo committed Oct 25, 2024
1 parent 5ac8249 commit e1e2227
Show file tree
Hide file tree
Showing 8 changed files with 276 additions and 90 deletions.
24 changes: 12 additions & 12 deletions docs/builds/ArtifactEquivalence@v0.1.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
# `ArtifactEquivalence` Build Type

The artifact equivalence attestation is a claim that two artifacts are equal
after some non-security relevant details have been normalized.
after certain non-security relevant aspects have been stabilized (see
[section below](#artifact-stabilization-details)).

Rebuilding exact bit-for-bit identical copies of upstream artifacts is not
always possible. However, in many cases, the only reason a bit-for-bit match
Expand Down Expand Up @@ -85,7 +86,7 @@ Example:

### Byproducts

The `byproducts` include a hash digest of the normalized version.
The `byproducts` include a hash digest of the stabilized version.

| field | details |
| -------- | ----------------------------------------------------------- |
Expand All @@ -105,21 +106,21 @@ Example:
]
```

## Normalization Details
## Artifact Stabilization Details

To compare the rebuilt artifact and the upstream artifact, OSS Rebuild puts both
artifacts through a normalization process and compares the results. If the
rebuild was successful, then the outcome of this process for both upstream and
rebuild should result in an identical "normalized" artifact.
artifacts through a stabilization process and compares the results. If the
rebuild was successful, then the result of this process for both upstream and
rebuild should be identical artifacts.

### Zip

[Zip](<https://en.wikipedia.org/wiki/ZIP_(file_format)>) is an archive file
format that supports lossless data compression. Zip archives contain
modification times, zip version metadata and other filesystem specific data
frequently differ from system to system. We believe this data does not have a
modification times, zip version metadata, and other filesystem specific data
that frequently differ from system to system. We believe this data does not have a
meaningful security impact for the source-based distribution systems like those
supported by OSS Rebuild. For zip based archives, this is the normalization
supported by OSS Rebuild. For zip based archives, this is the stabilization
process:

1. Read all the existing zip entries
Expand All @@ -138,15 +139,14 @@ that is done using another compression scheme in combination with tar. Tarballs
contain the file mode, owner and group IDs, and a modification time. These
frequently differ between build environments and we do not believe they have a
meaningful security impact for the source-based distribution systems like those
supported by OSS Rebuild. For tar based archives, this is the normalization
supported by OSS Rebuild. For tar based archives, this is the stabilization
process:

1. Read all the existing tar entries
1. Create new tar entries with:
- The same entry name
- The same file contents
- ModTime and AccessTime as 1985 Oct 26 8:15am UTC (an arbitrary date
time)
- ModTime and AccessTime to the UNIX epoch
- Uid and Gid of 0
- Empty Uname and Gname
- Mode 0777
Expand Down
11 changes: 6 additions & 5 deletions docs/builds/Rebuild@v0.1.md
Original file line number Diff line number Diff line change
Expand Up @@ -108,12 +108,13 @@ Example:

### Byproducts

The `byproducts` include a hash digest of the normalized version.
The `byproducts` include the full file constructs used produce the artifact
such as the high-level definition, the Cloud Build definition, and the specific Dockerfile.

| field | details |
| --------- | --------------------------------------------------------------------------------------------------------- |
| `name` | The high-level build definition, Dockerfile, and Google Cloud Build process that implemented the rebuild. |
| `content` | The base64-encoded content of the artifact. |
| field | details |
| --------- | -------------------------------------------------------- |
| `name` | The resource identifier for the build process byproduct. |
| `content` | The base64-encoded content of the artifact. |

Example:

Expand Down
5 changes: 3 additions & 2 deletions pkg/archive/archive.go
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ import (

// Stabilize selects and applies the stabilization routine for the given archive format.
func Stabilize(dst io.Writer, src io.Reader, f Format) error {
opts := StabilizeOpts{Stabilizers: append(AllZipStabilizers, AllTarStabilizers...)}
switch f {
case ZipFormat:
srcReader, size, err := toZipCompatibleReader(src)
Expand All @@ -37,7 +38,7 @@ func Stabilize(dst io.Writer, src io.Reader, f Format) error {
}
zw := zip.NewWriter(dst)
defer zw.Close()
err = StabilizeZip(zr, zw)
err = StabilizeZip(zr, zw, opts)
if err != nil {
return errors.Wrap(err, "stabilizing zip")
}
Expand All @@ -49,7 +50,7 @@ func Stabilize(dst io.Writer, src io.Reader, f Format) error {
defer gzr.Close()
gzw := gzip.NewWriter(dst)
defer gzw.Close()
err = StabilizeTar(tar.NewReader(gzr), tar.NewWriter(gzw))
err = StabilizeTar(tar.NewReader(gzr), tar.NewWriter(gzw), opts)
if err != nil {
return errors.Wrap(err, "stabilizing tar")
}
Expand Down
5 changes: 5 additions & 0 deletions pkg/archive/common.go
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,11 @@ const (
RawFormat
)

// StabilizeOpts aggregates sanitizers to be used in stabilization.
type StabilizeOpts struct {
Stabilizers []any
}

// ContentSummary is a summary of rebuild-relevant features of an archive.
type ContentSummary struct {
Files []string
Expand Down
141 changes: 100 additions & 41 deletions pkg/archive/tar.go
Original file line number Diff line number Diff line change
Expand Up @@ -20,45 +20,17 @@ import (
"crypto/sha256"
"encoding/hex"
"io"
"io/fs"
"os"
"path/filepath"
"slices"
"sort"
"strings"
"time"

billy "github.com/go-git/go-billy/v5"
"github.com/pkg/errors"
)

// Pick some arbitrary time to set all the time fields.
// Source: https://github.com/npm/pacote/blob/main/lib/util/tar-create-options.js#L28
var arbitraryTime = time.Date(1985, time.October, 26, 8, 15, 0, 0, time.UTC)

func stabilizeTarHeader(h *tar.Header) (*tar.Header, error) {
switch h.Typeflag {
case tar.TypeGNUSparse, tar.TypeGNULongName, tar.TypeGNULongLink:
// NOTE: Non-PAX header type support can be added, if necessary.
return nil, errors.Errorf("Unsupported header type: %v", h.Typeflag)
default:
return &tar.Header{
Typeflag: h.Typeflag,
Name: h.Name,
ModTime: arbitraryTime,
AccessTime: arbitraryTime,
// TODO: Surface presence/absence of execute bit as a comparison config.
Mode: 0777,
Uid: 0,
Gid: 0,
Uname: "",
Gname: "",
Size: h.Size,
// TODO: Surface comparison config for TAR metadata (PAXRecords, Xattrs).
Format: tar.FormatPAX,
}, nil
}
}

// TarEntry represents an entry in a tar archive.
type TarEntry struct {
*tar.Header
Expand All @@ -76,10 +48,88 @@ func (e TarEntry) WriteTo(tw *tar.Writer) error {
return nil
}

type TarFile struct{ Files []*TarEntry }

type TarArchiveStabilizer struct {
Name string
Func func(*TarFile)
}

type TarEntryStabilizer struct {
Name string
Func func(*TarEntry)
}

var AllTarStabilizers []any = []any{
StableTarFileOrder,
StableTarTime,
StableTarFileMode,
StableTarOwners,
StableTarXattrs,
StableTarDeviceNumber,
}

var StableTarFileOrder = TarArchiveStabilizer{
Name: "tar-file-order",
Func: func(f *TarFile) {
slices.SortFunc(f.Files, func(a, b *TarEntry) int {
return strings.Compare(a.Name, b.Name)
})
},
}

var StableTarTime = TarEntryStabilizer{
Name: "tar-time",
Func: func(e *TarEntry) {
e.ModTime = time.UnixMilli(0)
e.AccessTime = time.UnixMilli(0)
e.ChangeTime = time.Time{}
// NOTE: Without a PAX record, the tar library will disregard this value
// and write a USTAR-formatted file. Setting 'atime' ensures at least one
// record exists which will cause tar to serialize and re-parse it as PAX.
e.Format = tar.FormatPAX
},
}

var StableTarFileMode = TarEntryStabilizer{
Name: "tar-file-mode",
Func: func(e *TarEntry) {
e.Mode = int64(fs.ModePerm)
},
}

var StableTarOwners = TarEntryStabilizer{
Name: "tar-owners",
Func: func(e *TarEntry) {
e.Uid = 0
e.Gid = 0
e.Uname = ""
e.Gname = ""
},
}

var StableTarXattrs = TarEntryStabilizer{
Name: "tar-xattrs",
Func: func(e *TarEntry) {
clear(e.Xattrs)
clear(e.PAXRecords)
},
}

var StableTarDeviceNumber = TarEntryStabilizer{
Name: "tar-device-number",
Func: func(e *TarEntry) {
// NOTE: 0 is currently reserved on Linux and will dynamically allocated a
// device number when passed to the kernel.
e.Devmajor = 0
e.Devminor = 0
},
}

// StabilizeTar strips volatile metadata and re-writes the provided archive in a standard form.
func StabilizeTar(tr *tar.Reader, tw *tar.Writer) error {
func StabilizeTar(tr *tar.Reader, tw *tar.Writer, opts StabilizeOpts) error {
defer tw.Close()
var ents []TarEntry
var ents []*TarEntry
for {
header, err := tr.Next()
if err != nil {
Expand All @@ -88,22 +138,31 @@ func StabilizeTar(tr *tar.Reader, tw *tar.Writer) error {
}
return err
}
stabilized, err := stabilizeTarHeader(header)
if err != nil {
return err
// NOTE: Non-PAX header type support can be added, if necessary.
switch header.Typeflag {
case tar.TypeGNUSparse, tar.TypeGNULongName, tar.TypeGNULongLink:
return errors.New("Unsupported file type")
}
buf, err := io.ReadAll(tr)
if err != nil {
return err
}
// TODO: Memory-intensive. We're buffering the full file in memory (again).
// One option would be to do two passes and only buffer what's necessary.
ents = append(ents, TarEntry{stabilized, buf[:]})
// NOTE: Memory-intensive. We're buffering the full file in memory as
// tar.Reader is single-pass and we need to support sorting entries.
ents = append(ents, &TarEntry{header, buf[:]})
}
f := TarFile{Files: ents}
for _, s := range opts.Stabilizers {
switch s.(type) {
case TarArchiveStabilizer:
s.(TarArchiveStabilizer).Func(&f)
case TarEntryStabilizer:
for _, ent := range f.Files {
s.(TarEntryStabilizer).Func(ent)
}
}
}
sort.Slice(ents, func(i, j int) bool {
return ents[i].Header.Name < ents[j].Header.Name
})
for _, ent := range ents {
for _, ent := range f.Files {
if err := ent.WriteTo(tw); err != nil {
return err
}
Expand Down
12 changes: 7 additions & 5 deletions pkg/archive/tar_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ import (
"github.com/google/go-cmp/cmp"
)

var epoch = time.UnixMilli(0)

func TestStabilizeTar(t *testing.T) {
testCases := []struct {
test string
Expand All @@ -39,7 +41,7 @@ func TestStabilizeTar(t *testing.T) {
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0644, ModTime: time.Now(), AccessTime: time.Now()}, []byte("foo")},
},
expected: []*TarEntry{
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: arbitraryTime, AccessTime: arbitraryTime, PAXRecords: map[string]string{"atime": "499162500"}, Format: tar.FormatPAX}, []byte("foo")},
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: epoch, AccessTime: epoch, PAXRecords: map[string]string{"atime": "0"}, Format: tar.FormatPAX}, []byte("foo")},
},
},
{
Expand All @@ -49,8 +51,8 @@ func TestStabilizeTar(t *testing.T) {
{&tar.Header{Name: "bar", Typeflag: tar.TypeReg, Size: 3, Mode: 0644}, []byte("bar")},
},
expected: []*TarEntry{
{&tar.Header{Name: "bar", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: arbitraryTime, AccessTime: arbitraryTime, PAXRecords: map[string]string{"atime": "499162500"}, Format: tar.FormatPAX}, []byte("bar")},
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: arbitraryTime, AccessTime: arbitraryTime, PAXRecords: map[string]string{"atime": "499162500"}, Format: tar.FormatPAX}, []byte("foo")},
{&tar.Header{Name: "bar", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: epoch, AccessTime: epoch, PAXRecords: map[string]string{"atime": "0"}, Format: tar.FormatPAX}, []byte("bar")},
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: epoch, AccessTime: epoch, PAXRecords: map[string]string{"atime": "0"}, Format: tar.FormatPAX}, []byte("foo")},
},
},
{
Expand All @@ -59,7 +61,7 @@ func TestStabilizeTar(t *testing.T) {
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Uid: 10, Uname: "user", Gid: 30, Gname: "group"}, []byte("foo")},
},
expected: []*TarEntry{
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: arbitraryTime, AccessTime: arbitraryTime, PAXRecords: map[string]string{"atime": "499162500"}, Format: tar.FormatPAX}, []byte("foo")},
{&tar.Header{Name: "foo", Typeflag: tar.TypeReg, Size: 3, Mode: 0777, ModTime: epoch, AccessTime: epoch, PAXRecords: map[string]string{"atime": "0"}, Format: tar.FormatPAX}, []byte("foo")},
},
},
}
Expand All @@ -77,7 +79,7 @@ func TestStabilizeTar(t *testing.T) {
}
var output bytes.Buffer
zr := tar.NewReader(bytes.NewReader(input.Bytes()))
err := StabilizeTar(zr, tar.NewWriter(&output))
err := StabilizeTar(zr, tar.NewWriter(&output), StabilizeOpts{Stabilizers: AllTarStabilizers})
if err != nil {
t.Fatalf("StabilizeTar(%v) = %v, want nil", tc.test, err)
}
Expand Down
Loading

0 comments on commit e1e2227

Please sign in to comment.