-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of org.glassfish.json.JsonParserImpl #15
Comments
@jjspiegel Thanks for raising those issues. Unlike the JCP model with a dedicated Reference Implementation Jakarta EE does not mandate this or rather it can and should (in theory also at Eclipse or different communities like Apache, JBoss etc.) have more than one implementation. If the Glassfish "Spec Implementation" under the Jakarta EE umbrella is widely used, then I am pretty sure, the team and community will try to address many of those issues with upcoming releases, but it does not prevent others to create and maintain their own independent implementations that may have advantages over the SI. |
As more and more products use the JSON-P reference implementation in production, it is critical that parsing performance is good.
I think there is an opportunity to improve the performance of JsonParserImpl. Right now, the underlying tokenizer operates on a Java character string and is completely unaware of the underlying byte representation. In many cases, JSON is persisted as UTF8 - from rfc8259:
Java characters are represented in UTF-16 and conversion from UTF-8 to UTF-16 is often expensive.
I suggest making a special purpose tokenizer that operates directly on the UTF8 byte stream. Other encodings can continue to use the current code path as they will be less common. A special-case UTF8 tokenizer would provide the following benefits:
(1) Markup characters in the ascii range (curly braces, brackets, string delimiters, white space, etc) can be scanned with byte comparisons and never converted to UTF-16.
(2) JSON numbers, true, false, and null don't need to be converted to UTF-16
(3) Strings (keys and values) can be converted to UTF-16 lazily so that if they are never consumed by an application, they need not be converted.
(4) Skip methods (like skipArray() and skipObject()) could avoid any character set conversion of the skipped item.
The text was updated successfully, but these errors were encountered: