-
-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Denial of service when parsing JSON object with keys that have the same hash code #186
Comments
Here's the same issue on Jackson: According to that, it was fixed by something that JDK7/JDK8 did to its HashMap implementation, but it doesn't say what. We're not using a JDK HashMap, we're using Scala's immutable HashMap which I think is a trie map that uses the hash code. So potentially we could switch to a JDK HashMap and this would fix it, but I suspect that will adversely impact performance when generating JSON AST, since the JDK HashMap doesn't support efficient persistent add operations, for every field added we would have to copy the map. Other than that we'll have to research to find out what the JDK did to fix this, and either port that to Scala's map, or implement our own trie map that provides the fix. If it can be done simply by providing a custom hashcode function, then we might also be able to achieve it by wrapping the keys, though this would generate more garbage. |
Ok, here's the supposed JDK fix: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010238.html The thing is, I had a look at the fixes proposed, and I can't find any evidence that those fixes exist in the JDK today. |
Ah, found it, Doug Lea proposed an alternative and much more elegant solution: http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-June/010425.html This is in the JDK today and documented in the
Also from the /*
* Implementation notes.
*
* This map usually acts as a binned (bucketed) hash table, but
* when bins get too large, they are transformed into bins of
* TreeNodes, each structured similarly to those in
* java.util.TreeMap. Most methods try to use normal bins, but
* relay to TreeNode methods when applicable (simply by checking
* instanceof a node). Bins of TreeNodes may be traversed and
* used like any others, but additionally support faster lookup
* when overpopulated. However, since the vast majority of bins in
* normal use are not overpopulated, checking for existence of
* tree bins may be delayed in the course of table methods.
*
* Tree bins (i.e., bins whose elements are all TreeNodes) are
* ordered primarily by hashCode, but in the case of ties, if two
* elements are of the same "class C implements Comparable<C>",
* type then their compareTo method is used for ordering. (We
* conservatively check generic types via reflection to validate
* this -- see method comparableClassFor). The added complexity
* of tree bins is worthwhile in providing worst-case O(log n)
* operations when keys either have distinct hashes or are
* orderable, Thus, performance degrades gracefully under
* accidental or malicious usages in which hashCode() methods
* return values that are poorly distributed, as well as those in
* which many keys share a hashCode, so long as they are also
* Comparable. (If neither of these apply, we may waste about a
* factor of two in time and space compared to taking no
* precautions. But the only known cases stem from poor user
* programming practices that are already so slow that this makes
* little difference.)
*
* Because TreeNodes are about twice the size of regular nodes, we
* use them only when bins contain enough nodes to warrant use
* (see TREEIFY_THRESHOLD). And when they become too small (due to
* removal or resizing) they are converted back to plain bins. I think we can solve this in a somewhat similar manner in play-json's |
@plokhotnyuk sorry I was in the middle of researching and writing my comment when you posted yours :) |
Issue raised to get this fixed in Scala: |
Play JSON Version (2.5.x / etc)
2.7.0-M1
API (Scala / Java / Neither / Both)
Scala 2.12.7
Operating System (Ubuntu 15.10 / MacOS 10.10 / Windows 10)
Ubuntu 16.04
JDK (Oracle 1.8.0_72, OpenJDK 1.8.x, Azul Zing)
Oracle JDK 11
Library Dependencies
none
Expected Behavior
Sub-linear decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing
Actual Behavior
Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing
On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.4Mb) can took more than 100 seconds:
Below are results of benchmark (where
size
is a number of such fields) for different JSON parser including Play-JSON:Reproducible Test Case
To run that benchmarks on your JDK:
sbt
and/or ensure that it already installed properly:jsoniter-scala
repo:The text was updated successfully, but these errors were encountered: