Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline implementation of GToJSON #652

Merged
merged 1 commit into from
Aug 6, 2018
Merged

Conversation

remyoudompheng
Copy link

This work aims at solving a couple of performance issues with the Generic ToJSON instances where inlining is incomplete. As a result the generated code contains a number of references to Generic Rep types and to Aeson's generic helpers, which breaks the structure of the code and slows down both compilation and execution, especially for large types.

Commit message

Generic implementation should use inlining when possible
in order to eliminate Generic sum/product combinators
in final generated code.

The simplifications can actually lead to smaller code and faster
compilation.

Compilation time (GHC 8.4):

  • G/BigProduct.hs is 25% faster
  • G/BigRecord.hs is 2x faster

Runtime performance:

  • BigRecord/toJSON/generic is more than 2x faster
    (same as BigRecord/toJSON/th)
  • BigProduct/encode/generic is more than 2x faster
    (still almost 2x slower than BigProduct/encode/th

The same approach dose not improve GFromJSON due to the presence
of unsaturated applications.

Generic implementation should use inlining when possible
in order to eliminate Generic sum/product combinators
in final generated code.

The simplifications can actually lead to smaller code and faster
compilation.

Compilation time (GHC 8.4):
- G/BigProduct.hs is 25% faster
- G/BigRecord.hs is 2x faster

Runtime performance:
- BigRecord/toJSON/generic is more than 2x faster
  (same as BigRecord/toJSON/th)
- BigProduct/encode/generic is more than 2x faster
  (still almost 2x slower than BigProduct/encode/th

The same approach dose not improve GFromJSON due to the presence
of unsaturated applications.
@bergmark
Copy link
Collaborator

Thanks!

These were actually removed in #335 by @RyanGlScott, and for similar reasons that you have for adding them again! Is this due to GHC improvements since then?

It would be nice to know how e.g. pandoc-types compilation speed is affected (#296).

@remyoudompheng
Copy link
Author

Regarding pandoc-types 1.16.0.1 (as mentioned in #335) : it seems that this patch is not so harmful but #653 hurts GHC 7.10 and 8.0.

Here is the performance of "stack ghc -- -O -Rghc-timing -fforce-recomp Text/Pandoc/Definition.hs"

With GHC 8.4.3 (stackage 12.1):
aeson 1.3.1.1 :
18.22s user 0.27s system 100% cpu 18.440 total
ghc: 17498472560 bytes, 701 GCs, 98573026/301512776 avg/max bytes residency
763M in use, 0.000 INIT (0.000 elapsed), 11.182 MUT (11.498 elapsed), 6.566 GC (6.702 elapsed)
After #652 and #653 :
18.52s user 0.29s system 100% cpu 18.750 total
ghc: 17537839128 bytes, 736 GCs, 97226693/290310088 avg/max bytes residency
755M in use, 0.000 INIT (0.000 elapsed), 11.455 MUT (11.758 elapsed), 6.603 GC (6.761 elapsed)

With GHC 8.2.2 (stackage 11.12)
aeson 1.3.1.1 :
22.78s user 0.35s system 100% cpu 23.092 total
21431095176 bytes, 954 GCs, 120479146/386631320 avg/max bytes residency (17 samples)
1063M in use, 0.000 INIT (0.000 elapsed), 15.024 MUT (15.331 elapsed), 7.991 GC (7.983 elapsed)
After patches:
20.23s user 0.33s system 100% cpu 20.518 total
20003651600 bytes, 712 GCs, 102471383/299572968 avg/max bytes residency (17 samples)
822M in use, 0.000 INIT (0.000 elapsed), 12.933 MUT (13.268 elapsed), 6.977 GC (6.971 elapsed)

With GHC 8.0.2 (Stackage 9.21)
aeson 1.3.1.1 :
25.67s user 0.24s system 100% cpu 25.879 total
25214921488 bytes, 1264 GCs, 102389430/222448424 avg/max bytes residency (20 samples)
617M in use, 0.001 INIT (0.001 elapsed), 16.466 MUT (16.781 elapsed), 8.852 GC (8.852 elapsed)
After #652 only :
28.18s user 0.26s system 100% cpu 28.410 total
27233708376 bytes, 1483 GCs, 105435725/229250760 avg/max bytes residency (21 samples)
634M in use, 0.001 INIT (0.001 elapsed), 18.157 MUT (18.508 elapsed), 9.648 GC (9.648 elapsed)
After #652 and #653 :
91.55s user 0.59s system 100% cpu 1:32.10 total
84091717360 bytes, 3337 GCs, 276969184/722336008 avg/max bytes residency (29 samples)
1947M in use, 0.001 INIT (0.001 elapsed), 55.237 MUT (55.634 elapsed), 36.182 GC (36.183 elapsed)

With GHC 7.10 (Stackage 6.35)
aeson 1.3.1.1 :
25.20s user 0.23s system 100% cpu 25.400 total
24633413576 bytes, 1224 GCs, 105821571/227120080 avg/max bytes residency (22 samples)
588M in use, 0.001 INIT (0.001 elapsed), 14.660 MUT (14.987 elapsed), 10.176 GC (10.177 elapsed)
After #652 only :
31.64s user 0.28s system 100% cpu 31.885 total
29638486576 bytes, 1416 GCs, 119468812/291798008 avg/max bytes residency (24 samples)
785M in use, 0.001 INIT (0.001 elapsed), 18.434 MUT (18.839 elapsed), 12.816 GC (12.816 elapsed)
After #652 and #653 :
59.52s user 0.44s system 100% cpu 59.925 total
54921778664 bytes, 2273 GCs, 209009077/584430368 avg/max bytes residency (28 samples)
1467M in use, 0.000 INIT (0.000 elapsed), 33.013 MUT (33.490 elapsed), 26.176 GC (26.176 elapsed)

About #653, I actually see a NOINLINE trick to make it less horrible. I will try it and comment separately.

@bergmark
Copy link
Collaborator

bergmark commented Aug 6, 2018

Let's merge this!

@bergmark bergmark merged commit 1a322a1 into haskell:master Aug 6, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants