Eschewing boost::ptree and moving to RapidJSON #3164

obelisk · 2017-04-13T18:12:28Z

boost::ptree is slow, clunky and doesn't suit our use case very well anymore. ptree will happily generate the invalid JSON seen below given the right input to the property tree. This is not the fault of property trees really, but in the fact that they are meant to be a very generic container, more generic than JSON would normally allow.

{
    "" : {},
    "" : {}
}

The solution proposed is to move to a native JSON library, of which there are many, but this is a prototype for RapidJSON. The referenced branch in my own fork of osquery ports acceptWork in a transparent method (the entire method has its code altered and the normal code path uses this updated function) and the serializeQueryResults call chain in a non transparent manner (osquery is unaffected, it is only utilized by the benchmarks).

If we decide to go ahead with this it will be a major API change, with many code mods across the code base. Tests will need to be rewritten, functions will have their prototypes changed, and future pull requests that use ptrees will be looked at with suspicion.

To ease this transition, it has been suggested to create osquery::JSONObject as a wrapper around raw RapidJSON documents. This hasn't been done yet but you can see how it would work, just replace every instance of ptree or rapidjson::Document with osquery::JSONObject. This will also help people navigate using RapidJSON's default move constructors (as opposed to copy constructors) without leaving dangling pointers or having things erroneously freed.

Finally let's talk about speed. RapidJSON is much faster than boost::ptree. This is partly because ptree was never designed for JSON parsing and output meaning it has weaker guarantees to use but also that RapidJSON was designed to be fast. Below is a table comparing the existing serializeQueryResults code path and the prospective RapidJSON one.

Test Case	CPU(ns) boost::ptree	CPU(ns) rapidjson::Document	Improvement
DB_serialize_column_order/1/1	3493	871	4.01
DB_serialize_column_order/10/10	224234	14049	15.96
DB_serialize_column_order/10/100	2250971	128016	17.58
DB_serialize_column_order/100/100	23099589	1388649	16.63

To close, I'm putting this up as an RFC to see what people think and gather input about the coming transition. Here is a link to my branch comparing it to master.

The text was updated successfully, but these errors were encountered:

theopolis · 2017-11-16T02:58:45Z

Implemented with #3265 but there are follow up tasks for porting the remaining PTree uses to RapidJSON.

obelisk added API change performance RFC Request for comments: ideas or concepts to discuss. Not to merge it as is. labels Apr 13, 2017

obelisk self-assigned this Apr 13, 2017

tsg mentioned this issue Jan 2, 2018

Filebeat: osquery module elastic/beats#5971

Merged

theopolis closed this as completed Jul 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eschewing boost::ptree and moving to RapidJSON #3164

Eschewing boost::ptree and moving to RapidJSON #3164

obelisk commented Apr 13, 2017 •

edited

Loading

theopolis commented Nov 16, 2017

Eschewing boost::ptree and moving to RapidJSON #3164

Eschewing boost::ptree and moving to RapidJSON #3164

Comments

obelisk commented Apr 13, 2017 • edited Loading

theopolis commented Nov 16, 2017

obelisk commented Apr 13, 2017 •

edited

Loading