-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tar archives with large UIDs/GIDs not detected #307
Comments
Having just noticed |
Yes, I think so too. One side question: if you create a v7 tar on your machine, is it successfully detected with mimetype v1.4.1? |
I was just wondering the same thing 🙂 I'm currently creating test archives for all the formats generated by GNU tar to make sure mimetype can detect them. |
It is. I've included it as a new test case in #308. |
UStar tar archives have a `magic` header field at byte offset 101 in each entry whose value begins with the string `ustar`. Identify them with the MIME type `application/x-tar`. Also add test cases for a number of UStar-compatible formats, created by GNU tar 1.29 (with `--format=<format-name>`): * `tar.gnu.tar` * `tar.oldgnu.tar` * `tar.posix.tar` * `tar.ustar.tar` as well as `tar.star.tar` (created by star 1.6) and, for completeness, `tar.v7-gnu.tar` (a v7 tar archive created by GNU tar 1.29). Fixes gabriel-vasile#307.
UStar tar archives have a `magic` header field at byte offset 257 in each entry whose value begins with the string `ustar`. Identify them with the MIME type `application/x-tar`. Also add test cases for a number of UStar-compatible formats, created by GNU tar 1.29 (with `--format=<format-name>`): * `tar.gnu.tar` * `tar.oldgnu.tar` * `tar.posix.tar` * `tar.ustar.tar` as well as `tar.star.tar` (created by star 1.6) and, for completeness, `tar.v7-gnu.tar` (a v7 tar archive created by GNU tar 1.29). Fixes gabriel-vasile#307.
* Detect UStar tar archives UStar tar archives have a `magic` header field at byte offset 257 in each entry whose value begins with the string `ustar`. Identify them with the MIME type `application/x-tar`. Also add test cases for a number of UStar-compatible formats, created by GNU tar 1.29 (with `--format=<format-name>`): * `tar.gnu.tar` * `tar.oldgnu.tar` * `tar.posix.tar` * `tar.ustar.tar` as well as `tar.star.tar` (created by star 1.6) and, for completeness, `tar.v7-gnu.tar` (a v7 tar archive created by GNU tar 1.29). Fixes #307.
Attach the file for which the detection is inaccurate
Run this through
xxd -r
. (I'll also include it as a new test case in the PR I'm about to open.)Expected MIME type
application/x-tar
Returned MIME type
application/octet-stream
(no match)Version of the library you are using
v1.4.1 (although the bug also exists on master)
Output of
go version
Additional context
I work on a Linux system with large UIDs and GIDs - both of mine are 895295090. Creating a tar archive from files owned by this user with GNU tar 1.32 results in the following tar fields for those files:
uid
(byte offset 0x6c):80 00 00 00 35 5d 1e 72
gid
(byte offset 0x74):80 00 00 00 35 5d 1e 72
GNU tar's internals documentation explains the format of these fields (emphasis mine):
mimetype's tar detection logic inspects bytes at a number of offsets within specific fields and checks whether they fall within given ranges. These ranges are based on empirical work by the National Archives, who derived them empirically from a corpus of tar archives they'd collected. This isn't very reliable, because (as demonstrated here) different archivers store values in certain fields in different formats, and presumably their corpus simply didn't contain any archives with UIDs or GIDs large enough to overflow tar's basic format and have to be stored in the GNU format instead.
IMO, it would be far more reliable to detect tar files based on the value of the
magic
field at byte offset 0x101 - I've yet to come across an archiver that doesn't set the first five bytes of this field to75 73 74 61 72
(ustar
). GNU tar fills the rest of the field with spaces (i.e.75 73 74 61 72 20 20
).The text was updated successfully, but these errors were encountered: