Skip to content

Commit

Permalink
Add file-format.md
Browse files Browse the repository at this point in the history
Signed-off-by: Alexander Larsson <alexl@redhat.com>
  • Loading branch information
alexlarsson committed May 11, 2020
1 parent 4c91212 commit 744d14f
Show file tree
Hide file tree
Showing 2 changed files with 72 additions and 0 deletions.
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ to reconstruct and validate the current version of the image.

Delta compression is based on [bsdiff](http://www.daemonology.net/bsdiff/) and [zstd compression](https://facebook.github.io/zstd/).

The tar-diff file-format is described in [file-format.md](file-format.md)

License
-
tar-diff is licensed under the Apache License, Version 2.0. See
Expand Down
70 changes: 70 additions & 0 deletions file-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
File Format
-----------

A tar-diff file (media type `application/vnd.tar-diff`) consists of a
header, with the fixed bytes:

```
{ 't', 'a', 'r', 'd', 'f', '1', '\n', 0}
```

Followed by a [zstd](https://facebook.github.io/zstd/) compressed
stream, with a sequence of operations, each operation is encoded as
follows:

```
op: 1 byte
size: uint64 encoded as a varint
data: <size> bytes. data is absent for DeltaOpCopy and DeltaOpSeek
```

For varint encoding, see:
https://developers.google.com/protocol-buffers/docs/encoding#varints

Algorithm
---------
- unpack the first tar file to create a directory tree, which will be
referenced by the tar-diff.
- apply the sequence of operations in the tar-diff file in order,
producing a stream of bytes identical to the second tar file.

Only the content of the reference files is used, all tar metadata is
available in the tardiff. Similarly, only regular files are referenced
by the tardiff, not e.g. symlinks.


Operations
----------

```
DeltaOpData = 0
DeltaOpOpen = 1
DeltaOpCopy = 2
DeltaOpAddData = 3
DeltaOpSeek = 4
```

***DeltaOpData***
Emit the bytes from `<data>` into the output stream.

***DeltaOpOpen***
`<data>` is a the (relative) path to a file within the original
tarball. Set the source for subsequent `DeltaOpCopy` and `DeltaAddData`
operations to this file, and reset the source position to 0.

Tar-diff generates normalized paths with no `.` or `..` elements,
which this will never point outside the target directory. However, for
security reasons implementations should not rely on this.

***DeltaOpCopy***
Emit `<size>` bytes from the source file to the
output stream, and advance the source position by `<size>`

***DeltaOpAddData***
Read `<size>` bytes from the source file, and advance the source
position by `<size>`. Add these bytes bytewise to the bytes in
`<data>` with unsigned wrapping, and emit the result to the output
stream.

***DeltaOpSeek***
Set the source position to `<size>`

0 comments on commit 744d14f

Please sign in to comment.