Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denial of service when parsing JSON object with keys that have the same hash code #277

Closed
plokhotnyuk opened this issue Oct 15, 2018 · 7 comments
Milestone

Comments

@plokhotnyuk
Copy link

plokhotnyuk commented Oct 15, 2018

Sub-quadratic decreasing of throughput when number of JSON object fields (with keys that have the same hash code) is increasing

On contemporary CPUs parsing of such JSON object (with a sequence of 100000 fields like below that is ~1.6Mb) can took more than 160 seconds:

{
"!!sjyehe":null,
"!!sjyeiF":null,
"!!sjyej'":null,
"!!sjyfIe":null,
"!!sjyfJF":null,
...
}

Below are results of the benchmark where size is a number of such fields:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                         (size)  (value)   Mode  Cnt        Score       Error  Units
[info] ExtractFieldsBenchmark.readSpray       1     null  thrpt    5  2349094.580 ± 95798.868  ops/s
[info] ExtractFieldsBenchmark.readSpray      10     null  thrpt    5   332512.199 ± 21344.082  ops/s
[info] ExtractFieldsBenchmark.readSpray     100     null  thrpt    5     7714.406 ±   709.439  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000     null  thrpt    5       52.775 ±     3.951  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000     null  thrpt    5        0.369 ±     0.076  ops/s
[info] ExtractFieldsBenchmark.readSpray  100000     null  thrpt    5        0.006 ±     0.002  ops/s

Reproducible Test Case

To run that benchmarks on your JDK:

  1. Install latest version of sbt and/or ensure that it already installed properly:
sbt about
  1. Clone jsoniter-scala repo:
git clone https://github.com/plokhotnyuk/jsoniter-scala.git
  1. Enter to the cloned directory and checkout for the specific branch:
cd jsoniter-scala
git checkout spray-json-DoS-by-hashmap-collisions
  1. Run benchmarks using a path parameter to your JDK:
sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -jvm /usr/lib/jvm/jdk-11/bin/java -wi 2 -i 5 -p value=null .*ExtractFieldsBench.*Spray.*'
@jrudolph
Copy link
Member

See scala/bug#11203 for the underlying issue in Scala HashMaps.

One simple remedy could be to limit the number of fields per object for cases where the parser is run with untrusted data. One hurdle is that we currently don't support any configuration and we need to see how to introduce configuration for the parser.

@jrudolph
Copy link
Member

Btw. thanks a lot, @plokhotnyuk for the report.

For me its with collisions:

[info] Benchmark                         (size)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray       1  [2.1,""]  thrpt    5  1409411.385 ± 223571.130  ops/s
[info] ExtractFieldsBenchmark.readSpray      10  [2.1,""]  thrpt    5   182598.090 ±  23199.897  ops/s
[info] ExtractFieldsBenchmark.readSpray     100  [2.1,""]  thrpt    5     6920.474 ±   1024.811  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000  [2.1,""]  thrpt    5       37.499 ±      6.795  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000  [2.1,""]  thrpt    5        0.262 ±      0.059  ops/s

(didn't run with 100000)

vs without collisions:

[info] Benchmark                         (size)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray       1  [2.1,""]  thrpt    5  1441253.958 ± 370171.747  ops/s
[info] ExtractFieldsBenchmark.readSpray      10  [2.1,""]  thrpt    5   234755.592 ±  29935.717  ops/s
[info] ExtractFieldsBenchmark.readSpray     100  [2.1,""]  thrpt    5    27142.045 ±   6664.102  ops/s
[info] ExtractFieldsBenchmark.readSpray    1000  [2.1,""]  thrpt    5     2336.876 ±    674.203  ops/s
[info] ExtractFieldsBenchmark.readSpray   10000  [2.1,""]  thrpt    5      206.748 ±     55.765  ops/s

i.e. the slow downs are

    1 1409411.385 1441253.958 x  1.02
   10  182598.090  234755.592 x  1.29
  100    6920.474   27142.045 x  3.92
 1000      37.499    2336.876 x 63
10000       0.262     206.748 x789

@jrudolph
Copy link
Member

That is a reasonable limit couldn't be much higher than 100 fields because after that the slowdowns become overwhelming (even the ~4x at 100 fields could be seen as too bad already).

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Oct 23, 2018

@jrudolph Have you considered other options like usage of java.util.LinkedHashMap with Scala wrapper for Java maps?

@jrudolph
Copy link
Member

Have you considered other options like usage of java.util.LinkedHashMap with Scala wrapper for Java maps?

What are the options for an immutable map with wrapping? Would that mean that if a user updates the map, the complete wrapped mutable Java map has to be copied? That would be one possibility but it would introduce another performance surprise (though, updating JSON objects might not be a primary use case).

@jrudolph
Copy link
Member

We could probably also use a TreeMap.

@jrudolph
Copy link
Member

I adapted the benchmark to compare different map implementations under the colliding and a "simple" scenario:

[info] Benchmark                         (parser)  (size)  (stringMode)   (value)   Mode  Cnt        Score        Error  Units
[info] ExtractFieldsBenchmark.readSpray   HashMap       1        simple  [2.1,""]  thrpt    5  1516495.776 ±  86184.006  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap       1     colliding  [2.1,""]  thrpt    5  1386118.057 ± 140544.005  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap      10        simple  [2.1,""]  thrpt    5   303817.721 ±   8955.963  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap      10     colliding  [2.1,""]  thrpt    5   110322.839 ±  71261.896  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap     100        simple  [2.1,""]  thrpt    5    28101.099 ±   3214.325  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap     100     colliding  [2.1,""]  thrpt    5     5995.114 ±    773.248  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap    1000        simple  [2.1,""]  thrpt    5     2563.734 ±    325.752  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap    1000     colliding  [2.1,""]  thrpt    5       47.047 ±      4.658  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap   10000        simple  [2.1,""]  thrpt    5      247.852 ±     14.291  ops/s
[info] ExtractFieldsBenchmark.readSpray   HashMap   10000     colliding  [2.1,""]  thrpt    5        0.277 ±      0.065  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap       1        simple  [2.1,""]  thrpt    5  1670919.875 ± 214542.536  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap       1     colliding  [2.1,""]  thrpt    5  1526712.165 ± 286542.590  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap      10        simple  [2.1,""]  thrpt    5   304363.396 ±  64361.929  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap      10     colliding  [2.1,""]  thrpt    5   252418.584 ±  30345.330  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap     100        simple  [2.1,""]  thrpt    5    26810.959 ±   4233.477  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap     100     colliding  [2.1,""]  thrpt    5    21984.936 ±   4488.541  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap    1000        simple  [2.1,""]  thrpt    5     1831.009 ±    723.440  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap    1000     colliding  [2.1,""]  thrpt    5     1597.121 ±    792.578  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap   10000        simple  [2.1,""]  thrpt    5      181.681 ±     51.451  ops/s
[info] ExtractFieldsBenchmark.readSpray   TreeMap   10000     colliding  [2.1,""]  thrpt    5      152.067 ±     40.148  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap       1        simple  [2.1,""]  thrpt    5  1707331.361 ± 343750.235  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap       1     colliding  [2.1,""]  thrpt    5  1595036.410 ± 177580.428  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap      10        simple  [2.1,""]  thrpt    5   265289.014 ±  51378.543  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap      10     colliding  [2.1,""]  thrpt    5   231874.324 ±  72944.082  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap     100        simple  [2.1,""]  thrpt    5     5100.454 ±   4046.161  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap     100     colliding  [2.1,""]  thrpt    5     6434.174 ±    817.272  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap    1000        simple  [2.1,""]  thrpt    5       69.499 ±      4.403  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap    1000     colliding  [2.1,""]  thrpt    5       73.571 ±      2.256  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap   10000        simple  [2.1,""]  thrpt    5        0.952 ±      0.608  ops/s
[info] ExtractFieldsBenchmark.readSpray   ListMap   10000     colliding  [2.1,""]  thrpt    5        0.972 ±      0.293  ops/s

Here's another run just comparing HashMap and TreeMap on non-colliding input:

[info] Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
[info] ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
[info] ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
[info] ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
[info] ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
[info] ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
[info] ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
[info] ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
[info] ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
[info] ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
[info] ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s

That would mean that for reasonably sized objects with a size < 100, a TreeMap is only ~6% slower than a HashMap. So changing to a TreeMap for now seems to be a good and simple solution for this particular case.

@jrudolph jrudolph added the Bug label Oct 30, 2018
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark for non-colliding keys:

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
jrudolph added a commit to jrudolph/spray-json that referenced this issue Oct 30, 2018
…ray#277

The problem is that with String's hashCode implementation it is too simple to
create synthetic collisions. This allows an attacker to create an object with
keys that all collide which leads to a performance drop for the HashMap
just for creating the map in the first place. See scala/bug#11203
for more information about the underlying HashMap issue.

For the time being, it seems safer to use a TreeMap which uses String ordering.
Benchmarks suggest that using a TreeMap is only ~6% slower for reasonably sized JSON objects
up to 100 keys.

Benchmark for non-colliding keys:

Benchmark                         (_size)  (parser)   Mode  Cnt        Score       Error  Units
ExtractFieldsBenchmark.readSpray        1   HashMap  thrpt    5  1195832.262 ± 64366.605  ops/s
ExtractFieldsBenchmark.readSpray        1   TreeMap  thrpt    5  1342009.641 ± 17307.555  ops/s
ExtractFieldsBenchmark.readSpray       10   HashMap  thrpt    5   237173.327 ± 70341.742  ops/s
ExtractFieldsBenchmark.readSpray       10   TreeMap  thrpt    5   233510.618 ± 69638.750  ops/s
ExtractFieldsBenchmark.readSpray      100   HashMap  thrpt    5    23202.016 ±  1514.763  ops/s
ExtractFieldsBenchmark.readSpray      100   TreeMap  thrpt    5    21899.072 ±   823.225  ops/s
ExtractFieldsBenchmark.readSpray     1000   HashMap  thrpt    5     2073.754 ±    66.093  ops/s
ExtractFieldsBenchmark.readSpray     1000   TreeMap  thrpt    5     1793.329 ±    43.603  ops/s
ExtractFieldsBenchmark.readSpray    10000   HashMap  thrpt    5      208.160 ±     7.466  ops/s
ExtractFieldsBenchmark.readSpray    10000   TreeMap  thrpt    5      160.349 ±     5.809  ops/s
jrudolph added a commit that referenced this issue Nov 7, 2018
CVE-2018-18854 Use TreeMap instead of HashMap for JsObject key/value pairs, fixes #277
@jrudolph jrudolph added this to the 1.3.5 milestone Nov 8, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants