Denial of service when parsing JSON object with keys that have the same hash code #277

plokhotnyuk · 2018-10-15T15:46:03Z

Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing

On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.6Mb) can took more than 160 seconds:

{
"!!sjyehe":null,
"!!sjyeiF":null,
"!!sjyej'":null,
"!!sjyfIe":null,
"!!sjyfJF":null,
...
}

Below are results of the benchmark where size is a number of such fields:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                         (size)  (value)   Mode  Cnt        Score       Error  Units
[info] ExtractFieldsBenchmark.readSpray       1     null  thrpt    5  2349094.580 ± 95798.868  ops/s
[info] ExtractFieldsBenchmark.readSpray      10     null  thrpt    5   332512.199 ± 21344.082  ops/s
[info] ExtractFieldsBenchmark.readSpray     100     null  thrpt    5     7714.406 ±   709.439  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000     null  thrpt    5       52.775 ±     3.951  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000     null  thrpt    5        0.369 ±     0.076  ops/s
[info] ExtractFieldsBenchmark.readSpray  100000     null  thrpt    5        0.006 ±     0.002  ops/s

Reproducible Test Case

To run that benchmarks on your JDK:

Install latest version of sbt and/or ensure that it already installed properly:

sbt about

Clone jsoniter-scala repo:

git clone https://github.com/plokhotnyuk/jsoniter-scala.git

Enter to the cloned directory and checkout for the specific branch:

cd jsoniter-scala
git checkout spray-json-DoS-by-hashmap-collisions

Run benchmarks using a path parameter to your JDK:

sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -jvm /usr/lib/jvm/jdk-11/bin/java -wi 2 -i 5 -p value=null .*ExtractFieldsBench.*Spray.*'

The text was updated successfully, but these errors were encountered:

jrudolph · 2018-10-23T12:09:44Z

See scala/bug#11203 for the underlying issue in Scala HashMaps.

One simple remedy could be to limit the number of fields per object for cases where the parser is run with untrusted data. One hurdle is that we currently don't support any configuration and we need to see how to introduce configuration for the parser.

jrudolph · 2018-10-23T13:24:25Z

Btw. thanks a lot, @plokhotnyuk for the report.

For me its with collisions:

[info] Benchmark                         (size)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray       1  [2.1,""]  thrpt    5  1409411.385 ± 223571.130  ops/s
[info] ExtractFieldsBenchmark.readSpray      10  [2.1,""]  thrpt    5   182598.090 ±  23199.897  ops/s
[info] ExtractFieldsBenchmark.readSpray     100  [2.1,""]  thrpt    5     6920.474 ±   1024.811  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000  [2.1,""]  thrpt    5       37.499 ±      6.795  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000  [2.1,""]  thrpt    5        0.262 ±      0.059  ops/s

(didn't run with 100000)

vs without collisions:

[info] Benchmark                         (size)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray       1  [2.1,""]  thrpt    5  1441253.958 ± 370171.747  ops/s
[info] ExtractFieldsBenchmark.readSpray      10  [2.1,""]  thrpt    5   234755.592 ±  29935.717  ops/s
[info] ExtractFieldsBenchmark.readSpray     100  [2.1,""]  thrpt    5    27142.045 ±   6664.102  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000  [2.1,""]  thrpt    5     2336.876 ±    674.203  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000  [2.1,""]  thrpt    5      206.748 ±     55.765  ops/s

i.e. the slow downs are

    1 1409411.385 1441253.958 x  1.02
   10  182598.090  234755.592 x  1.29
  100    6920.474   27142.045 x  3.92
 1000      37.499    2336.876 x 63
10000       0.262     206.748 x789

jrudolph · 2018-10-23T13:29:11Z

That is a reasonable limit couldn't be much higher than 100 fields because after that the slowdowns become overwhelming (even the ~4x at 100 fields could be seen as too bad already).

plokhotnyuk · 2018-10-23T14:45:13Z

@jrudolph Have you considered other options like usage of java.util.LinkedHashMap with Scala wrapper for Java maps?

jrudolph · 2018-10-23T15:02:22Z

Have you considered other options like usage of java.util.LinkedHashMap with Scala wrapper for Java maps?

What are the options for an immutable map with wrapping? Would that mean that if a user updates the map, the complete wrapped mutable Java map has to be copied? That would be one possibility but it would introduce another performance surprise (though, updating JSON objects might not be a primary use case).

jrudolph · 2018-10-23T15:43:13Z

We could probably also use a TreeMap.

jrudolph · 2018-10-24T10:29:41Z

I adapted the benchmark to compare different map implementations under the colliding and a "simple" scenario:

[info] Benchmark                         (parser)  (size)  (stringMode)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray   HashMap       1        simple  [2.1,""]  thrpt    5  1516495.776 ±  86184.006  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap       1     colliding  [2.1,""]  thrpt    5  1386118.057 ± 140544.005  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap      10        simple  [2.1,""]  thrpt    5   303817.721 ±   8955.963  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap      10     colliding  [2.1,""]  thrpt    5   110322.839 ±  71261.896  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap     100        simple  [2.1,""]  thrpt    5    28101.099 ±   3214.325  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap     100     colliding  [2.1,""]  thrpt    5     5995.114 ±    773.248  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap    1000        simple  [2.1,""]  thrpt    5     2563.734 ±    325.752  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap    1000     colliding  [2.1,""]  thrpt    5       47.047 ±      4.658  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap   10000        simple  [2.1,""]  thrpt    5      247.852 ±     14.291  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap   10000     colliding  [2.1,""]  thrpt    5        0.277 ±      0.065  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap       1        simple  [2.1,""]  thrpt    5  1670919.875 ± 214542.536  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap       1     colliding  [2.1,""]  thrpt    5  1526712.165 ± 286542.590  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap      10        simple  [2.1,""]  thrpt    5   304363.396 ±  64361.929  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap      10     colliding  [2.1,""]  thrpt    5   252418.584 ±  30345.330  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap     100        simple  [2.1,""]  thrpt    5    26810.959 ±   4233.477  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap     100     colliding  [2.1,""]  thrpt    5    21984.936 ±   4488.541  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap    1000        simple  [2.1,""]  thrpt    5     1831.009 ±    723.440  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap    1000     colliding  [2.1,""]  thrpt    5     1597.121 ±    792.578  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap   10000        simple  [2.1,""]  thrpt    5      181.681 ±     51.451  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap   10000     colliding  [2.1,""]  thrpt    5      152.067 ±     40.148  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap       1        simple  [2.1,""]  thrpt    5  1707331.361 ± 343750.235  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap       1     colliding  [2.1,""]  thrpt    5  1595036.410 ± 177580.428  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap      10        simple  [2.1,""]  thrpt    5   265289.014 ±  51378.543  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap      10     colliding  [2.1,""]  thrpt    5   231874.324 ±  72944.082  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap     100        simple  [2.1,""]  thrpt    5     5100.454 ±   4046.161  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap     100     colliding  [2.1,""]  thrpt    5     6434.174 ±    817.272  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap    1000        simple  [2.1,""]  thrpt    5       69.499 ±      4.403  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap    1000     colliding  [2.1,""]  thrpt    5       73.571 ±      2.256  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap   10000        simple  [2.1,""]  thrpt    5        0.952 ±      0.608  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap   10000     colliding  [2.1,""]  thrpt    5        0.972 ±      0.293  ops/s

Here's another run just comparing HashMap and TreeMap on non-colliding input:

[info] Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
[info] ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
[info] ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
[info] ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
[info] ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
[info] ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
[info] ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
[info] ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
[info] ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
[info] ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
[info] ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s

That would mean that for reasonably sized objects with a size < 100, a TreeMap is only ~6% slower than a HashMap. So changing to a TreeMap for now seems to be a good and simple solution for this particular case.

…ray#277 The problem is that with String's hashCode implementation it is too simple to create synthetic collisions. This allows an attacker to create an object with keys that all collide which leads to a performance drop for the HashMap just for creating the map in the first place. See scala/bug#11203 for more information about the underlying HashMap issue. For the time being, it seems safer to use a TreeMap which uses String ordering. Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects up to 100 keys. Benchmark (_size) (parser) Mode Cnt Score Error Units ExtractFieldsBenchmark.readSpray 1 HashMap thrpt 5 1195832.262 ± 64366.605 ops/s ExtractFieldsBenchmark.readSpray 1 TreeMap thrpt 5 1342009.641 ± 17307.555 ops/s ExtractFieldsBenchmark.readSpray 10 HashMap thrpt 5 237173.327 ± 70341.742 ops/s ExtractFieldsBenchmark.readSpray 10 TreeMap thrpt 5 233510.618 ± 69638.750 ops/s ExtractFieldsBenchmark.readSpray 100 HashMap thrpt 5 23202.016 ± 1514.763 ops/s ExtractFieldsBenchmark.readSpray 100 TreeMap thrpt 5 21899.072 ± 823.225 ops/s ExtractFieldsBenchmark.readSpray 1000 HashMap thrpt 5 2073.754 ± 66.093 ops/s ExtractFieldsBenchmark.readSpray 1000 TreeMap thrpt 5 1793.329 ± 43.603 ops/s ExtractFieldsBenchmark.readSpray 10000 HashMap thrpt 5 208.160 ± 7.466 ops/s ExtractFieldsBenchmark.readSpray 10000 TreeMap thrpt 5 160.349 ± 5.809 ops/s

…ray#277 The problem is that with String's hashCode implementation it is too simple to create synthetic collisions. This allows an attacker to create an object with keys that all collide which leads to a performance drop for the HashMap just for creating the map in the first place. See scala/bug#11203 for more information about the underlying HashMap issue. For the time being, it seems safer to use a TreeMap which uses String ordering. Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects up to 100 keys. Benchmark for non-colliding keys: Benchmark (_size) (parser) Mode Cnt Score Error Units ExtractFieldsBenchmark.readSpray 1 HashMap thrpt 5 1195832.262 ± 64366.605 ops/s ExtractFieldsBenchmark.readSpray 1 TreeMap thrpt 5 1342009.641 ± 17307.555 ops/s ExtractFieldsBenchmark.readSpray 10 HashMap thrpt 5 237173.327 ± 70341.742 ops/s ExtractFieldsBenchmark.readSpray 10 TreeMap thrpt 5 233510.618 ± 69638.750 ops/s ExtractFieldsBenchmark.readSpray 100 HashMap thrpt 5 23202.016 ± 1514.763 ops/s ExtractFieldsBenchmark.readSpray 100 TreeMap thrpt 5 21899.072 ± 823.225 ops/s ExtractFieldsBenchmark.readSpray 1000 HashMap thrpt 5 2073.754 ± 66.093 ops/s ExtractFieldsBenchmark.readSpray 1000 TreeMap thrpt 5 1793.329 ± 43.603 ops/s ExtractFieldsBenchmark.readSpray 10000 HashMap thrpt 5 208.160 ± 7.466 ops/s ExtractFieldsBenchmark.readSpray 10000 TreeMap thrpt 5 160.349 ± 5.809 ops/s

CVE-2018-18854 Use TreeMap instead of HashMap for JsObject key/value pairs, fixes #277

plokhotnyuk mentioned this issue Oct 24, 2018

Denial of service when parsing JSON object with keys that have the same hash code argonaut-io/argonaut#314

Open

jrudolph added the Bug label Oct 30, 2018

jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018

Test for spray#277 for checking on travis

2d458a1

jrudolph added the Security label Nov 7, 2018

jrudolph closed this as completed in 855b35e Nov 7, 2018

jrudolph added a commit that referenced this issue Nov 7, 2018

Merge pull request #280 from jrudolph/use-TreeMap-fixes-277

c8e106f

CVE-2018-18854 Use TreeMap instead of HashMap for JsObject key/value pairs, fixes #277

SethTisue mentioned this issue Nov 7, 2018

DoS vulnerability in Scala 2.12 HashMap scala/bug#11203

Closed

jrudolph added this to the 1.3.5 milestone Nov 8, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Denial of service when parsing JSON object with keys that have the same hash code #277

Denial of service when parsing JSON object with keys that have the same hash code #277

plokhotnyuk commented Oct 15, 2018 •

edited

Loading

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

plokhotnyuk commented Oct 23, 2018 •

edited

Loading

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

jrudolph commented Oct 24, 2018

Denial of service when parsing JSON object with keys that have the same hash code #277

Denial of service when parsing JSON object with keys that have the same hash code #277

Comments

plokhotnyuk commented Oct 15, 2018 • edited Loading

Reproducible Test Case

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

plokhotnyuk commented Oct 23, 2018 • edited Loading

jrudolph commented Oct 23, 2018

jrudolph commented Oct 23, 2018

jrudolph commented Oct 24, 2018

plokhotnyuk commented Oct 15, 2018 •

edited

Loading

plokhotnyuk commented Oct 23, 2018 •

edited

Loading