Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for ref specs, fixed issue with assoc spec, added hex-dump, set version to 1.0.2 #11

Merged
merged 2 commits into from
Oct 6, 2017

Conversation

mbjarland
Copy link
Contributor

@mbjarland mbjarland commented Sep 14, 2017

Adding support for dynamic specs where the length is stored in another spec, fixing a size issue with assoc spec, adding hex-dump function, set version to 1.0.2.

Input welcomed as I am somewhat new to clojure and some of the additions touch the core design of the library.

The reason for adding this is that a lot of binary formats out there do not store the string/byte-array/x length immediately preceding the data. Also the size of the length field (i.e. inte16 vs int32 etc) varies from binary format to binary format and can not be known beforehand.

Real world example from the zip file specification which I'm currently parsing with octet:

 central file header signature   4 bytes  (0x02014b50)
        version made by                 2 bytes
        version needed to extract       2 bytes
        general purpose bit flag        2 bytes
        compression method              2 bytes
        last mod file time              2 bytes
        last mod file date              2 bytes
        crc-32                          4 bytes
        compressed size                 4 bytes
        uncompressed size               4 bytes
        file name length                2 bytes
        extra field length              2 bytes
        file comment length             2 bytes
        disk number start               2 bytes
        internal file attributes        2 bytes
        external file attributes        4 bytes
        relative offset of local header 4 bytes

        file name (variable size)
        extra field (variable size)
        file comment (variable size)

note thte file name (ref-string*), extra-field (ref-bytes*) and file comment (ref-string*) fields and their corresponding length fields further up in the specification. Using the new spec types in this pull request, the above could be represented as:

  (spec                                                     ; size 46 + string lengths
    :cdr-header-signature         int32
    :version-made-by              int16
    :version-needed-to-extract    int16
    :general-purpose              int16
    :compression-method           int16
    :last-mod-file-time           int16
    :last-mod-file-date           int16
    :crc-32                       int32
    :compressed-size              int32
    :uncompressed-size            int32
    :file-name-length             int16
    :extra-field-length           int16
    :file-comment-length          int16
    :disk-number-start            int16
    :internal-file-attributes     int16
    :external-file-attributes     int32
    :relative-offset-local-header int32
    
    :file-name                    (ref-string* :file-name-length)
    :extra-field                  (ref-bytes* :extra-field-length)
    :file-comment                 (ref-string* :file-comment-length)))

This pr also supports the equivalent syntax for indexed specs, replacing the keyword references with long indexes.

Also adding a hex-dump function, example usage:

(hex-dump (byte-array (range 250))
          :offset 10
          :size 240
          :frame true
          :print true)
 --------------------------------------------------------------------
|0000000a: 0a0b 0c0d 0e0f 1011  1213 1415 1617 1819  ................|
|0000001a: 1a1b 1c1d 1e1f 2021  2223 2425 2627 2829  ...... !"#$%&'()|
|0000002a: 2a2b 2c2d 2e2f 3031  3233 3435 3637 3839  *+,-./0123456789|
|0000003a: 3a3b 3c3d 3e3f 4041  4243 4445 4647 4849  :;<=>?@ABCDEFGHI|
|0000004a: 4a4b 4c4d 4e4f 5051  5253 5455 5657 5859  JKLMNOPQRSTUVWXY|
|0000005a: 5a5b 5c5d 5e5f 6061  6263 6465 6667 6869  Z[\]^_`abcdefghi|
|0000006a: 6a6b 6c6d 6e6f 7071  7273 7475 7677 7879  jklmnopqrstuvwxy|
|0000007a: 7a7b 7c7d 7e7f 8081  8283 8485 8687 8889  z{|}~�..........|
|0000008a: 8a8b 8c8d 8e8f 9091  9293 9495 9697 9899  ................|
|0000009a: 9a9b 9c9d 9e9f a0a1  a2a3 a4a5 a6a7 a8a9  ................|
|000000aa: aaab acad aeaf b0b1  b2b3 b4b5 b6b7 b8b9  ................|
|000000ba: babb bcbd bebf c0c1  c2c3 c4c5 c6c7 c8c9  ................|
|000000ca: cacb cccd cecf d0d1  d2d3 d4d5 d6d7 d8d9  ................|
|000000da: dadb dcdd dedf e0e1  e2e3 e4e5 e6e7 e8e9  ................|
|000000ea: eaeb eced eeef f0f1  f2f3 f4f5 f6f7 f8f9  ................|
 --------------------------------------------------------------------

details:

  • changed the map containing the assoc spec from hash map
    to array map so that it keeps insertion order even for assoc
    specs larger than 10. Implemented assoc-ordered in
    octet.util to support modifications
  • added a new protocol ISpecWithRef which is an either-or
    with ISpec. Not sure about the best way to model this,
    input welcomed. ISpecWithRef indicates that the spec
    needs to know about its surrounding AssociativeSpec or
    IndexedSpec as the size of the spec is stored in another
    spec in the surrounding spec.
  • added new code to AssociativeSpec and IndexedSpec to
    support the new ISpecWithRef type of spec where the
    contained spec needs to be aware of the containers
    data / structure.
  • added new spec types ref-string* and ref-bytes* for
    dynamic types where the length is stored in another
    spec.
  • added a hex-dump function capable of printing a human
    readable dump of nio ByteBuffers, netty ByteBufs, byte
    arrays, and Strings. Buffy has this and as I like octet
    better and find it extremely useful I figured it would be
    a good addition. Compared to the buffy one, this
    implementation is faster, terser, and more
    configurable.
  • added tests for the new ref spec types.

Not done: documentation. Will add documentation if this pull request is deemed sane.

Adding support for dynamic specs where the length is stored in
another spec, fixing a size issue with assoc spec, adding hex-dump
function, set version to 1.0.2.

Changes:
o changed the map containing the assoc spec so that it keeps
  insertion order even for assoc specs larger than 10
o added a new protocol ISpecWithRef which is an either-or
  with ISpec. Not sure about the best way to model this,
  input welcomed.
o added new code to AssociativeSpec and IndexedSpec to
  support the new ISpecWithRef type of spec where the
  contained spec needs to be aware of the containers
  data / structure.
o added new spec types ref-string* and ref-bytes* for
  dynamic types where the length is stored in another
  spec.
o added a hex-dump function capable of printing a human
  readable dump of nio ByteBuffers, netty ByteBufs, byte
  arrays, and Strings. Buffy has this. If accepted, octet
  will have one too (though faster, terser, and more
  configurable :). I find this very useful when trying
  to understand a binary format.
o added tests for the new ref spec types.

Not done: documentation
@mbjarland
Copy link
Contributor Author

oh, and the version increment was just for my local testing. I can naturally pull that out to support whatever versioning cadence fits the project.

@niwinz
Copy link
Member

niwinz commented Sep 17, 2017 via email

@mbjarland
Copy link
Contributor Author

Any ballpark on timeframe here? Only asking as I have another library which I would like to push to clojars some time soonish which depends on the features in this PR.

@niwinz
Copy link
Member

niwinz commented Sep 23, 2017 via email

@mbjarland
Copy link
Contributor Author

as a side note the hex-dump function currently supports byte buffers, byte arrays, and strings. Its trivial to extend this to say files.

@niwinz
Copy link
Member

niwinz commented Oct 6, 2017

@mbjarland very nice work! I have reviewed it and it is excelent.

I have one question, the vector-ref* has a TODO, is this something missing. I expect you go to implement it?

@niwinz niwinz merged commit 3a3776f into funcool:master Oct 6, 2017
@niwinz
Copy link
Member

niwinz commented Oct 6, 2017

1.1.0-SNAPSHOT is published in clojars

@mbjarland
Copy link
Contributor Author

Yes, the todo was more a note to self that I should probably implement a similar pattern for vectors. Will have to think that one through a bit.

@mbjarland
Copy link
Contributor Author

Also I have not considered the following two things:

  • clojurescript support. I copied some of the patterns from existing code but have not tested any of it in clojurescript. Not even sure I would know how to
  • the composability of things. Can we have associative specs containing associative specs etc recursively? It would probably be nice if we could. If we can't maybe we should consider it. This would essentially fill a similar gap as "frames" in other frameworks such as buffy etc where you can define reusable sub-specs and then use them to build larger ones. Except that I dislike the concept of a spec being something different than a frame as is the case in buffy. Nicer if you just have one concept which composes freely.

Anyway, hopefully I did not break any composability aspects with this change.

@mbjarland
Copy link
Contributor Author

The next week or two will be tight for me with my dayjob, but if I get time I will try to fill in the documentation pieces for this change and perhaps consider the above composability aspect.

The clojurescript stuff I would probably not be qualified to consider so if you have the chops that would be much appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants