Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: binary output #40

Closed
trombonehero opened this issue May 14, 2015 · 4 comments
Closed

Feature request: binary output #40

trombonehero opened this issue May 14, 2015 · 4 comments

Comments

@trombonehero
Copy link
Contributor

We're using libxo as part of the SOAAP project to output the results of our security-oriented static analysis. It's working quite well, but when we apply the tool to applications like Chromium we get JSON files of over 1GB. These are hard to work with, not just because they're large, but because we can't seek in the file without parsing.

It would be helpful if libxo also supported a binary output format with some kind of length encoding so that we could skip large objects and arrays that we're not currently interested in. Perhaps an existing encoding could be employed... for instance, there are C bindings for Cap'n Proto (MIT license).

@philshafer
Copy link

I’m not about to wade into picking a binary protocol; capnproto .vs. protobufs .vs. thrift .vs. all the rest of the weeds popping up.

But I can definitely see the need. There are two issues: first libxo is essentially schema-less. The terms and encodings follow those YANG (and NETCONF) since that are reasonably well defined and portable over various protocols. Note that YANG gives generic rules for encoding, not schema-specific ones. So ‘xo_emit(“{:year/%d}”, 2015)’ can be encoded as ‘“year”: 2015’ and/or ‘2015’ without needing the “1:”, “@1” or “= 1” needed by thrift, protobufs, or capnproto (respectively). This schema-less-ness is imho a Good Thing.

The second issue is the need to avoid adding numerous encodings to libxo directly. I need to make a pluggable architecture so a library can get a set of callbacks and do the encoding with dynamically loaded shared libraries.

The latter makes the former less of an issue, since a thrift encoder can require a pre-compiled thrift schema exist before it can encode.

In either case, the feature libxo needs is this pluggability that allows additional encodings to be supplied.

Then a third performance-related issue will arise, which is the need to precompile format strings to allow encodings to perform name->number lookups once and retain the mappings in the formatting information.

Thanks,
Phil

On May 14, 2015, at 12:52 PM, Jonathan Anderson notifications@github.com wrote:

We're using libxo as part of the SOAAP project https://github.com/CTSRD-SOAAP/soaap to output the results of our security-oriented static analysis. It's working quite well, but when we apply the tool to applications like Chromium we get JSON files of over 1GB. These are hard to work with, not just because they're large, but because we can't seek in the file without parsing.

It would be helpful if libxo also supported a binary output format with some kind of length encoding so that we could skip large objects and arrays that we're not currently interested in. Perhaps an existing encoding could be employed... for instance, there are C bindings for Cap'n Proto (MIT license) https://capnproto.org/otherlang.html.


Reply to this email directly or view it on GitHub #40.

@philshafer
Copy link

I've coded a mechanism for external encoders and have written both a test jig (for "make test") and a CBOR (RFC 7049) encoder. I still need to write the docs for it, but the example code should give you a preview into what's coming. Please let me know what you think. I'll doc it shortly. Essentially libxo does all the formatting pieces are presents a callback function with an operation and an optional name/value pair. The callback module can hang private data off the xo_handle_t and it will be passed back to the callback function. Modules can be explictly registered (via xo_encoder_register()) or a dynamic library can be placed in ${prefix}/lib/libxo/extensions/${name}.enc and be dynamically loaded when to calls requests it (via xo_encoder_init() or "--libxo encoder=cbor"). The cbor encoder uses the "pretty" flag for diagnostics, which isn't really kosher, but it's example code.

Thanks,
Phil

https://raw.githubusercontent.com/Juniper/libxo/develop/encoder/cbor/enc_cbor.c

@philshafer
Copy link

Docs are in develop now. Have a look:

https://github.com/Juniper/libxo/blob/develop/doc/libxo.txt#L2161

I'll likely pull a new release in a day or two, so please let me know.

Thanks,
Phil

@philshafer
Copy link

Incorporated into libxo-0.4.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants