Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different extension compared to the file command #52

Open
alireza-rezaee opened this issue Feb 8, 2024 · 2 comments
Open

Different extension compared to the file command #52

alireza-rezaee opened this issue Feb 8, 2024 · 2 comments

Comments

@alireza-rezaee
Copy link

It seems that an error occurs here and it recognizes the .gz as .bin. While the file has recognized it correctly. I don't know, doesn't the file have an api to get the extension directly? If I understand correctly we are actually using MIME Type mapping as an alternative.

[Fact]
public void Guess_Gzip_ReturnSameAsNative()
{
    // small gzip file: https://github.com/mathiasbynens/small
    byte[] s_gzipBytes =
    [
        0x1f, 0x8b, 0x08, 0x00, 0xae, 0x86, 0xe1, 0x5b, 0x02, 0x03, 0x03, 0x00, 0x00, 0x00, 0x00, 0x00,
        0x00, 0x00, 0x00, 0x00
    ];

    var actualMimeType = GuessMimeType(s_gzipBytes);
    var actualExtension = GuessExtension(s_gzipBytes);

    // $ file gzip.gz --mime
    // → gzip.gz: application/gzip; charset=binary
    string expectedMimeType = "application/gzip";

    // $ file gzip.gz --extension
    // → gzip.gz: gz/tgz/tpz/ipk/vbox-extpack/svgz
    string[] expectedExtensions = [ "gz", "tgz", "tpz", "ipk", "vbox-extpack", "svgz"];

    Assert.Equal(expectedMimeType, actualMimeType);
    Assert.Contains(expectedExtensions, e => e == actualExtension); // ← Exception raised here
}

Assert.Contains() Failure

Assert.Contains() Failure
Not found: (filter expression)
In value:  String[] ["gz", "tgz", "tpz", "ipk", "vbox-extpack", ...]
   at Test.UnitTest.Guess_Gzip_ReturnsSameAsNative() in .../UnitTest.cs:line 28
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

This is probably because MimeTypesMap — which depends on MIME types known by Apache:

public static string GuessExtension(byte[] buffer) => MimeTypesMap.GetExtension(GuessMimeType(buffer));

@hey-red
Copy link
Owner

hey-red commented Feb 9, 2024

If you want to get extension directly from libmagic:

using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
var result = magic.Read(@"/path/to/gzip.gz"); // from file
Console.WriteLine(result); // -> gz/tgz/tpz/ipk/vbox-extpack/svgz/blend/dia/gnucash/rdata/xoj

However for this file it doesn't work when the magic_buffer method is used:

byte[] buf = File.ReadAllBytes(@"/path/to/gzip.gz");
using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
var result = magic.Read(buf, buf.Length);
Console.WriteLine(result); // -> "???"

I have no idea why we have different behaviour..
But I think you can update MimeTypesMap dict, before you get results from MimeGuesser:
MimeTypesMap.AddOrUpdate("application/gzip", "gz");
or create your own dictionary with mime<->extensions mapping.

@alireza-rezaee
Copy link
Author

alireza-rezaee commented Feb 9, 2024

byte[] buf = File.ReadAllBytes(@"/path/to/gzip.gz");
using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
var result = magic.Read(buf, buf.Length);
Console.WriteLine(result); // -> "???"

This difference behavior is strange, but it is OK with this very similar gzip-name.gz file:

$ xxd gzip.gz
00000000: 1f8b 0800 ae86 e15b 0203 0300 0000 0000  .......[........
00000010: 0000 0000                                ....

$ xxd gzip-name.gz
00000000: 1f8b 0808 ae86 e15b 0203 6e00 0300 0000  .......[..n.....
00000010: 0000 0000 0000
// gzip-name.gz
bytes[] fileBytes =
[
    0x1f, 0x8b, 0x08, 0x08, 0xae, 0x86, 0xe1, 0x5b, 0x02, 0x03, 0x6e, 0x00, 0x03, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00
];
using var magic = new Magic(MagicOpenFlags.MAGIC_EXTENSION);
magic.Read(fileBytes, fileBytes.Length) // -> "gz/tgz/tpz/zabw/svgz/adz/kmy/xcfgz"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants