Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Denial of service when parsing a JSON object with an unexpected field that has a big number #187

Closed
plokhotnyuk opened this issue Oct 12, 2018 · 5 comments

Comments

@plokhotnyuk
Copy link

plokhotnyuk commented Oct 12, 2018

Play JSON Version (2.5.x / etc)

2.7.0-M1

API (Scala / Java / Neither / Both)

Scala 2.12.7

Operating System (Ubuntu 15.10 / MacOS 10.10 / Windows 10)

Ubuntu 16.04

JDK (Oracle 1.8.0_72, OpenJDK 1.8.x, Azul Zing)

Oracle JDK 11

Library Dependencies

none

Expected Behavior

Sub-linear decreasing of throughput when length of the JSON object is increasing

Actual Behavior

Sub-quadratic decreasing of throughput when length of the JSON object is increasing

On contemporary CPUs parsing of such JSON object with an additional field that has of 1000000 decimal digits (~1Mb) can took more than 13 seconds:

[info] REMEMBER: The numbers below are just data. To gain reusable insights, you need to follow up on
[info] why the numbers are the way they are. Use profilers (see -prof, -lprof), design factorial
[info] experiments, perform baseline and negative tests that provide experimental control, make sure
[info] the benchmarking environment is safe on JVM/OS/HW level, ask for reviews from the domain experts.
[info] Do not assume the numbers tell you what you want them to tell.
[info] Benchmark                                     (size)   Mode  Cnt         Score        Error  Units
[info] ExtractFieldsBenchmark.readAVSystemGenCodec        1  thrpt    5   3415309.675 ± 358732.424  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec       10  thrpt    5   3429891.871 ± 135960.100  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec      100  thrpt    5    686235.263 ± 832769.728  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec     1000  thrpt    5    194588.852 ±   8165.939  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec    10000  thrpt    5     28335.193 ±    911.581  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec   100000  thrpt    5      2948.038 ±    128.163  ops/s
[info] ExtractFieldsBenchmark.readAVSystemGenCodec  1000000  thrpt    5       649.088 ±    199.346  ops/s
[info] ExtractFieldsBenchmark.readCirce                   1  thrpt    5   1181495.148 ± 302987.993  ops/s
[info] ExtractFieldsBenchmark.readCirce                  10  thrpt    5   1277915.025 ± 179880.016  ops/s
[info] ExtractFieldsBenchmark.readCirce                 100  thrpt    5   1277950.564 ± 256709.663  ops/s
[info] ExtractFieldsBenchmark.readCirce                1000  thrpt    5    816515.741 ±  15529.900  ops/s
[info] ExtractFieldsBenchmark.readCirce               10000  thrpt    5    146038.446 ±   2585.134  ops/s
[info] ExtractFieldsBenchmark.readCirce              100000  thrpt    5     16825.855 ±    468.669  ops/s
[info] ExtractFieldsBenchmark.readCirce             1000000  thrpt    5      1693.840 ±     59.649  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava             1  thrpt    5  11703558.471 ± 196574.764  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava            10  thrpt    5  10418348.204 ± 125349.933  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava           100  thrpt    5   4854474.847 ± 335999.431  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava          1000  thrpt    5    833174.664 ±  22787.464  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava         10000  thrpt    5     88047.329 ±    894.533  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava        100000  thrpt    5      9037.421 ±     97.407  ops/s
[info] ExtractFieldsBenchmark.readDslJsonJava       1000000  thrpt    5       918.420 ±     13.473  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala            1  thrpt    5   2533509.752 ±  75854.375  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala           10  thrpt    5   2521299.318 ±  37344.857  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala          100  thrpt    5    868736.640 ±  19590.367  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala         1000  thrpt    5     54637.764 ±   1922.976  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala        10000  thrpt    5       723.644 ±     14.444  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala       100000  thrpt    5         7.254 ±      0.414  ops/s
[info] ExtractFieldsBenchmark.readJacksonScala      1000000  thrpt    5         0.077 ±      0.001  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala           1  thrpt    5  17357927.186 ± 105168.663  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala          10  thrpt    5  15509884.192 ± 599007.176  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala         100  thrpt    5  10557719.687 ±  82797.425  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala        1000  thrpt    5   2306588.382 ±  15014.663  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala       10000  thrpt    5    252999.473 ±   2013.190  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala      100000  thrpt    5     24022.123 ±    490.780  ops/s
[info] ExtractFieldsBenchmark.readJsoniterScala     1000000  thrpt    5      2042.339 ±    118.757  ops/s
[info] ExtractFieldsBenchmark.readPlayJson                1  thrpt    5    928062.700 ±  35964.755  ops/s
[info] ExtractFieldsBenchmark.readPlayJson               10  thrpt    5    908324.771 ±  41278.052  ops/s
[info] ExtractFieldsBenchmark.readPlayJson              100  thrpt    5    538588.245 ±  58035.196  ops/s
[info] ExtractFieldsBenchmark.readPlayJson             1000  thrpt    5     52739.058 ±   5124.015  ops/s
[info] ExtractFieldsBenchmark.readPlayJson            10000  thrpt    5       743.426 ±      6.226  ops/s
[info] ExtractFieldsBenchmark.readPlayJson           100000  thrpt    5         7.351 ±      0.030  ops/s
[info] ExtractFieldsBenchmark.readPlayJson          1000000  thrpt    5         0.067 ±      0.018  ops/s
[info] ExtractFieldsBenchmark.readUPickle                 1  thrpt    5   3340922.046 ± 246892.139  ops/s
[info] ExtractFieldsBenchmark.readUPickle                10  thrpt    5   3483490.433 ±  39971.435  ops/s
[info] ExtractFieldsBenchmark.readUPickle               100  thrpt    5   2494567.445 ±  71404.382  ops/s
[info] ExtractFieldsBenchmark.readUPickle              1000  thrpt    5    814753.180 ±  30787.779  ops/s
[info] ExtractFieldsBenchmark.readUPickle             10000  thrpt    5    101384.553 ±   1049.347  ops/s
[info] ExtractFieldsBenchmark.readUPickle            100000  thrpt    5     10380.287 ±     43.464  ops/s
[info] ExtractFieldsBenchmark.readUPickle           1000000  thrpt    5       991.119 ±     60.797  ops/s

Reproducible Test Case

To run that benchmarks on your JDK:

  1. Install latest version of sbt and/or ensure that it already installed properly:
sbt about
  1. Clone jsoniter-scala repo:
git clone https://github.com/plokhotnyuk/jsoniter-scala.git
  1. Enter to the cloned directory and checkout for the specific branch:
cd jsoniter-scala
git checkout play-json-DoS-using-big-number
  1. Run benchmarks using a path parameter to your JDK:
sbt -no-colors 'jsoniter-scala-benchmark/jmh:run -jvm /usr/lib/jvm/jdk-11/bin/java -wi 5 -i 5 .*ExtractFieldsBench.*'
@marcospereira
Copy link
Member

Hi @plokhotnyuk, is ExtractFieldsBenchmark the correct benchmark here? Or should we instead run com.github.plokhotnyuk.jsoniter_scala.macros.BigIntBenchmark?

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Oct 12, 2018

@marcospereira those are different cases: ExtractFieldsBenchmark shows that any parsing of JSON object is affected while BigIntBenchmark is only for bindings to BigInt, BigDecimal or other types which use big numbers like in that initial case for timestamps:

#180

@marcospereira
Copy link
Member

Hi @plokhotnyuk,

Thanks for the additional information. I was able to reproduce the problem using ExtractFieldsBenchmark in play-json-DoS-using-big-number branch. I'm currently investigating a possible fix to Play JSON, but it is possible that we need to wait for a fix in Jackson since it is the underlying library we use to do the parsing (see Json.parse(String), StaticBinding and finally JacksonJson).

@marcospereira
Copy link
Member

Hum, actually, I think we can impose some limits here:

case JsonTokenId.ID_NUMBER_INT | JsonTokenId.ID_NUMBER_FLOAT => (Some(JsNumber(jp.getDecimalValue)), parserContext)

Get the text value is a (way) cheaper operation. We still need to load all data to memory (not sure if Jackson offers another way to check the value length), but we can check if the number of "digits" is beyond an acceptable limit (I see jsoniter is using these limits by default).

It looks like a workaround to me, but at least we can move forward without entirely depending on Jackson.

WDYT?

@plokhotnyuk
Copy link
Author

plokhotnyuk commented Oct 13, 2018

@marcospereira IMHO patches for #180, #186, and #187 issues of Play-JSON should be released without waiting for patched Jackson, Scala or Java libraries

Also, we should avoid returning of parsed BigDecimal with too big exponent or with MathContext.UNLIMITED.

Just try how this code work with different Scala versions:

scala> val f = (x: BigDecimal) => x + 1
f: BigDecimal => scala.math.BigDecimal = $$Lambda$1210/93981118@790ac3e0

scala> f(BigDecimal("1e1000000000"))

or

scala> val g = (x: BigDecimal) => 1 + x
g: BigDecimal => scala.math.BigDecimal = $$Lambda$1091/1954133542@e8ea697

scala> g(BigDecimal("1e-1000000000", java.math.MathContext.UNLIMITED))

Most users (which usually do their financial calculations with big decimals) do not aware that such pure functions have a side effect that cannot be easily ignored on contemporary hardware.

dwijnand pushed a commit that referenced this issue Nov 14, 2018
## Fixes

Fixes #187 

## Purpose

Parsing large big decimals (thing tens of hundred digits) and operating on these numbers
can be very CPU demanding. While play-json currently supports handling large numbers, it
is not practical on real-world applications and can expose them to DoS of service attacks.

This changes the way parsing happens to limit the size of such numbers based on
MathContext.DECIMAL128.
dwijnand added a commit that referenced this issue Nov 27, 2018
* Avoid parsing large big decimals (#200)

* Avoid parsing large big decimals

Parsing large big decimals (thing tens of hundred digits) and operating on these numbers
can be very CPU demanding. While play-json currently supports handling large numbers, it
is not practical on real-world applications and can expose them to DoS of service attacks.

This changes the way parsing happens to limit the size of such numbers based on
MathContext.DECIMAL128.

* Format details

* Fix typo

* Remove tests duplication

* Add breadcrumbs detailing where precision is defined

* Improve parsing readability

* Improve test readability

* Make it possible to configure the parsing for large big decimals (#191)

Fixes #187

Parsing large big decimals (thing tens of hundred digits) and operating on these numbers
can be very CPU demanding. While play-json currently supports handling large numbers, it
is not practical on real-world applications and can expose them to DoS of service attacks.

This changes the way parsing happens to limit the size of such numbers based on
MathContext.DECIMAL128.

* Fix binary compatibility issues

* Codec for BigInt (#122)

* Codec for BigInt

* MiMa

* More tests

* Add small comment about bincompat filter

* Fix Scala 2.10 compatibility issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants