Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eschewing boost::ptree and moving to RapidJSON #3164

Closed
obelisk opened this issue Apr 13, 2017 · 1 comment
Closed

Eschewing boost::ptree and moving to RapidJSON #3164

obelisk opened this issue Apr 13, 2017 · 1 comment
Assignees
Labels
API change performance RFC Request for comments: ideas or concepts to discuss. Not to merge it as is.

Comments

@obelisk
Copy link
Contributor

obelisk commented Apr 13, 2017

boost::ptree is slow, clunky and doesn't suit our use case very well anymore. ptree will happily generate the invalid JSON seen below given the right input to the property tree. This is not the fault of property trees really, but in the fact that they are meant to be a very generic container, more generic than JSON would normally allow.

{
    "" : {},
    "" : {}
}

The solution proposed is to move to a native JSON library, of which there are many, but this is a prototype for RapidJSON. The referenced branch in my own fork of osquery ports acceptWork in a transparent method (the entire method has its code altered and the normal code path uses this updated function) and the serializeQueryResults call chain in a non transparent manner (osquery is unaffected, it is only utilized by the benchmarks).

If we decide to go ahead with this it will be a major API change, with many code mods across the code base. Tests will need to be rewritten, functions will have their prototypes changed, and future pull requests that use ptrees will be looked at with suspicion.

To ease this transition, it has been suggested to create osquery::JSONObject as a wrapper around raw RapidJSON documents. This hasn't been done yet but you can see how it would work, just replace every instance of ptree or rapidjson::Document with osquery::JSONObject. This will also help people navigate using RapidJSON's default move constructors (as opposed to copy constructors) without leaving dangling pointers or having things erroneously freed.

Finally let's talk about speed. RapidJSON is much faster than boost::ptree. This is partly because ptree was never designed for JSON parsing and output meaning it has weaker guarantees to use but also that RapidJSON was designed to be fast. Below is a table comparing the existing serializeQueryResults code path and the prospective RapidJSON one.

Test Case CPU(ns) boost::ptree CPU(ns) rapidjson::Document Improvement
DB_serialize_column_order/1/1 3493 871 4.01
DB_serialize_column_order/10/10 224234 14049 15.96
DB_serialize_column_order/10/100 2250971 128016 17.58
DB_serialize_column_order/100/100 23099589 1388649 16.63

To close, I'm putting this up as an RFC to see what people think and gather input about the coming transition. Here is a link to my branch comparing it to master.

@obelisk obelisk added API change performance RFC Request for comments: ideas or concepts to discuss. Not to merge it as is. labels Apr 13, 2017
@obelisk obelisk self-assigned this Apr 13, 2017
@theopolis
Copy link
Member

Implemented with #3265 but there are follow up tasks for porting the remaining PTree uses to RapidJSON.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API change performance RFC Request for comments: ideas or concepts to discuss. Not to merge it as is.
Projects
None yet
Development

No branches or pull requests

2 participants