Skip to content

Commit

Permalink
Detect UStar tar archives
Browse files Browse the repository at this point in the history
UStar tar archives have a `magic` header field at byte offset 257 in
each entry whose value begins with the string `ustar`. Identify them
with the MIME type `application/x-tar`.

Also add test cases for a number of UStar-compatible formats, created by
GNU tar 1.29 (with `--format=<format-name>`):

* `tar.gnu.tar`
* `tar.oldgnu.tar`
* `tar.posix.tar`
* `tar.ustar.tar`

as well as `tar.star.tar` (created by star 1.6) and, for completeness,
`tar.v7-gnu.tar` (a v7 tar archive created by GNU tar 1.29).

Fixes #307.
  • Loading branch information
chrisnovakovic committed Jul 13, 2022
1 parent e59e9d7 commit bc2b8c3
Show file tree
Hide file tree
Showing 8 changed files with 24 additions and 8 deletions.
12 changes: 10 additions & 2 deletions internal/magic/archive.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,13 +74,21 @@ func CRX(raw []byte, limit uint32) bool {
}

// Tar matches a (t)ape (ar)chive file.
//
// Signature source: https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=385&strPageToDisplay=signatures
func Tar(raw []byte, _ uint32) bool {
if len(raw) < 256 {
return false
}

// The "magic" header field for files in in UStar (POSIX IEEE P1003.1) archives
// has the prefix "ustar". The values of the remaining bytes in this field vary
// by archiver implementation.
if bytes.HasPrefix(raw[257:], []byte{0x75, 0x73, 0x74, 0x61, 0x72}) {
return true
}

// The older v7 format has no "magic" field, and therefore must be identified
// with heuristics based on legal ranges of values for other header fields:
// https://www.nationalarchives.gov.uk/PRONOM/Format/proFormatSearch.aspx?status=detailReport&id=385&strPageToDisplay=signatures
rules := []struct {
min, max uint8
i int
Expand Down
20 changes: 14 additions & 6 deletions mimetype_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -180,12 +180,20 @@ var files = map[string]string{
// the timestamps.
"not.srt.txt": "text/plain; charset=utf-8",
// not.srt.2.txt does not specify milliseconds.
"not.srt.2.txt": "text/plain; charset=utf-8",
"svg.1.svg": "image/svg+xml",
"svg.svg": "image/svg+xml",
"swf.swf": "application/x-shockwave-flash",
"tar.tar": "application/x-tar",
"tar.v7.tar": "application/x-tar",
"not.srt.2.txt": "text/plain; charset=utf-8",
"svg.1.svg": "image/svg+xml",
"svg.svg": "image/svg+xml",
"swf.swf": "application/x-shockwave-flash",
"tar.tar": "application/x-tar",
"tar.gnu.tar": "application/x-tar",
"tar.oldgnu.tar": "application/x-tar",
"tar.posix.tar": "application/x-tar",
// tar.star.tar was generated with star 1.6.
"tar.star.tar": "application/x-tar",
"tar.ustar.tar": "application/x-tar",
"tar.v7.tar": "application/x-tar",
// tar.v7-gnu.tar is a v7 tar archive generated with GNU tar 1.29.
"tar.v7-gnu.tar": "application/x-tar",
"tcl.tcl": "text/x-tcl",
"tcx.tcx": "application/vnd.garmin.tcx+xml",
"tiff.tiff": "image/tiff",
Expand Down
Binary file added testdata/tar.gnu.tar
Binary file not shown.
Binary file added testdata/tar.oldgnu.tar
Binary file not shown.
Binary file added testdata/tar.posix.tar
Binary file not shown.
Binary file added testdata/tar.star.tar
Binary file not shown.
Binary file added testdata/tar.ustar.tar
Binary file not shown.
Binary file added testdata/tar.v7-gnu.tar
Binary file not shown.

0 comments on commit bc2b8c3

Please sign in to comment.