Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for CBOR sequence file format #194

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
128 changes: 128 additions & 0 deletions internal/magic/binary.go
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,131 @@ func Marc(raw []byte, limit uint32) bool {
// Field terminator is present.
return bytes.Contains(raw, []byte{0x1E})
}

// CborSeq matches CBOR sequences
func CborSeq(raw []byte, limit uint32) bool {
if len(raw) == 0 {
return false
}
offset, i := 0, 0
ok, oldok := true, true
for ; ok && offset != len(raw); i++ {
oldok = ok
offset, ok = cborHelper(raw, offset)
}
if limit == uint32(len(raw)) {
ok = oldok
}
return ok && i > 1
}

func cborHelper(raw []byte, offset int) (int, bool) {
raw_len := len(raw) - offset
if raw_len == 0 {
return 0, false
}

mt := uint8(raw[offset] & 0xe0)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the specification mt is:

mt = ib >> 5;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used 0x40, 0x60, 0x80... because those are the raw byte values that CBOR uses. A bitwise AND with 0xe0 (11100000) has the same effect as making 5 right shifts (>> 5) - it discards the 5 last bits of the byte. Testing was a easier like this, but I will change it to 2, 3, 4... to avoid confusion.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, >> 5 is not the same as & 0xe0. https://play.golang.org/p/NFgj0z0Mt9Q

ai := raw[offset] & 0x1f
val := int(ai)
offset++

BgEn := binary.BigEndian
switch ai {
case 24:
if raw_len < 2 {
return 0, false
}
val = int(raw[offset])
offset++
if mt == 0xe0 && uint64(raw[offset]) < 32 {
return 0, false
}
case 25:
if raw_len < 3 {
return 0, false
}
val = int(BgEn.Uint16(raw[offset : offset+2]))
offset += 2
case 26:
if raw_len < 5 {
return 0, false
}
val = int(BgEn.Uint32(raw[offset : offset+4]))
offset += 4
case 27:
if raw_len < 9 {
return 0, false
}
val = int(BgEn.Uint64(raw[offset : offset+8]))
offset += 8
case 31:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In specification:

    case 31:
      return well_formed_indefinite(mt, breakable);

Copy link
Author

@qiu-x qiu-x Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am returning false here because for those values well_formed_indefinite would call fail(). But in my code cborIndefinite is directly running cborHelper to avoid duplication - this is possible because the mt switch in well_formed_indefinite and well_formed is almost the same. Because of this I exclude the cases that cborIndefinite should not handle.

switch mt {
case 0x00, 0x20, 0xc0:
return 0, false
case 0xe0:
return 0, false
}
default:
if ai > 24 { // ie. case 28: case 29: case 30
return 0, false
}
}

switch mt {
case 0x40, 0x60:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In specification:

switch (mt) {
    // case 0, 1, 7 do not have content; just use val
    case 2: case 3: take(val); break; // bytes/UTF-8
    case 4: for (i = 0; i < val; i++) well_formed(); break;
    case 5: for (i = 0; i < val*2; i++) well_formed(); break;
    case 6: well_formed(); break;     // 1 embedded data item
    case 7: if (ai == 24 && val < 32) fail(); // bad simple
  }

if ai == 31 {
return cborIndefinite(raw, mt, offset)
}
if val < 0 || len(raw)-offset < val {
return 0, false
}
offset += val
case 0x80, 0xa0:
if ai == 31 {
return cborIndefinite(raw, mt, offset)
}
if val < 0 {
return 0, false
}
count := 1
if mt == 0xa0 {
count = 2
}
for i := 0; i < val*count; i++ {
var ok bool
offset, ok = cborHelper(raw, offset)
if !ok {
return 0, false
}
}
case 0xc0:
return cborHelper(raw, offset)
default:
return 0, false
}
return offset, true
}

func cborIndefinite(raw []byte, mt uint8, offset int) (int, bool) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this function is the equivalent of well_formed_indefinite from specification, but it does not look like it does the same thing.

Copy link
Author

@qiu-x qiu-x Nov 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be equivalent to the while loops in well_formed_indefinite, but it handles all the cases, since I already excluded the bad ones earlier in the code

var ok bool
i := 0
for {
if len(raw) == offset {
return 0, false
}
if raw[offset] == 0xff {
offset++
break
}
offset, ok = cborHelper(raw, offset)
if !ok {
return 0, false
}
i++
}
if mt == 0xa0 && i%2 == 1 {
return 0, false
}
return offset, true
}
1 change: 1 addition & 0 deletions mimetype_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ var files = map[string]string{
"bpg.bpg": "image/bpg",
"bz2.bz2": "application/x-bzip2",
"cab.cab": "application/vnd.ms-cab-compressed",
"cborseq": "application/cbor-seq",
"class.class": "application/x-java-applet",
"crx.crx": "application/x-chrome-extension",
"csv.csv": "text/csv",
Expand Down
3 changes: 2 additions & 1 deletion supported_mimes.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## 165 Supported MIME types
## 166 Supported MIME types
This file is automatically generated when running tests. Do not edit manually.

Extension | MIME type | Aliases
Expand Down Expand Up @@ -135,6 +135,7 @@ Extension | MIME type | Aliases
**.pat** | image/x-gimp-pat | -
**.gbr** | image/x-gimp-gbr | -
**.glb** | model/gltf-binary | -
**n/a** | application/cbor-seq | -
**.txt** | text/plain | -
**.html** | text/html | -
**.svg** | image/svg+xml | -
Expand Down
1 change: 1 addition & 0 deletions testdata/cborseq
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z�t2013-03-21T20:04:00Z
3 changes: 2 additions & 1 deletion tree.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ var root = newMIME("application/octet-stream", "",
gzip, class, swf, crx, ttf, woff, woff2, otf, eot, wasm, shx, dbf, dcm, rar,
djvu, mobi, lit, bpg, sqlite3, dwg, nes, lnk, macho, qcp, icns, heic,
heicSeq, heif, heifSeq, hdr, mrc, mdb, accdb, zstd, cab, rpm, xz, lzip,
torrent, cpio, tzif, xcf, pat, gbr, glb,
torrent, cpio, tzif, xcf, pat, gbr, glb, cborseq,
// Keep text last because it is the slowest check
text,
)
Expand Down Expand Up @@ -247,4 +247,5 @@ var (
gbr = newMIME("image/x-gimp-gbr", ".gbr", magic.Gbr)
xfdf = newMIME("application/vnd.adobe.xfdf", ".xfdf", magic.Xfdf)
glb = newMIME("model/gltf-binary", ".glb", magic.Glb)
cborseq = newMIME("application/cbor-seq", "", magic.CborSeq)
)