Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Apply recent C optimizations to Java encoder (#725)
* Make benchmark runnable without oj available * Port convert_UTF8_to_ASCII_only_JSON to Java This is new specialized logic to reduce overhead when appending ASCII-only strings to the generated JSON. Original code by @byroot See #620 * Align string generate method with generate_json_string * Port convert_UTF8_to_JSON from C Also includes updated logic for generate (generate_json_string) based on current C code. Original code by @byroot See #620 * Use external iteration to reduce alloc Lots of surrounding state so just take the hit of a Set and Iterator rather than a big visitor object. * Remove unused imports * Inline ConvertBytes logic for long to byte[] This change duplicates some code from JRuby to allow rendering the fixnum value to a shared byte array rather than allocating new for each value. Since fixnum dumping is a leaf operation, only one is needed per session. * Eliminate * import * Restructure handlers for easier profiling Anonymous classes show up as unnamed, numbered classes in profiles which makes them difficult to read. * Avoid allocation when writing Array delimiters Rather than allocating a buffer to hold N copies of arrayNL, just write it N times. We're buffering into a stream anyway. This makes array dumping zero-alloc other than buffer growth. * Move away from Handler abstraction Since there's a fixed number of types we have special dumping logic for, this abstraction just introduces overhead we don't need. This patch starts moving away from indirecting all dumps through the Handler abstraction and directly generating from the type switch. This also aligns better with the main loop of the C code and should inline and optimize better. * Match C version of fbuffer_append_long * Minor tweaks to reduce complexity * Reimpl byte[] stream without synchronization The byte[] output stream used here extended ByteArrayOutputStream from the JDK, which sychronizes all mutation operations (like writes). Since this is only going to be used once within a given call stack, it needs no synchronization. This change more than triples the performance of a benchmark of dumping an array of empty arrays and should increase performance of all dump forms. * Reduce overhead in repeats * Return incoming array if only one repeat is needed and array is exact size. * Only retrieve ByteList fields once for repeat writes. * Use equivalent of rb_sym2str * Microoptimizations for ByteList stream * Cast to byte not necessary * Refactor this for better inlining * More tiny tweaks to reduce overhead of generateString * Refactor to avoid repeated boolean checks * Eliminate memory accesses for digits The math is much faster here than array access, due to bounds checking and pointer dereferencing. * Loosen visibility to avoid accessor methods Java will generated accessor methods for private fields, burning some inlining budget. * Modify parser bench to work without oj or rapidjson
- Loading branch information