Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of mapstructure in HTTP API #6147

Closed
pearkes opened this issue Jul 16, 2019 · 3 comments · Fixed by #6680
Closed

Improve performance of mapstructure in HTTP API #6147

pearkes opened this issue Jul 16, 2019 · 3 comments · Fixed by #6680
Assignees
Labels
theme/performance Performance benchmarking or potential improvement type/enhancement Proposed improvement or new feature
Milestone

Comments

@pearkes
Copy link
Contributor

pearkes commented Jul 16, 2019

We should cache mapstructure conversion results in the API (see below) or we could do the conversion before any field processing happens and just use those results everywhere. The decodeBody function is used by more than just the TXN API but it is most noticeable there given the types of requests typically made.

Quoting @mkeeler:

Every time DecodeHookExec is called it first runs typedDecodeHook on the ComposeDecodeHookFunc. Then after conversion etc. it will execute the composed function which recursively invokes DecodeHookExec on each of those two functions. That in turn does the same thing with calling typedDecodeHook. So for every field we call typedDecodeHook 3 times. As you can see in the flamegraph calling that function accounts for most of the time spent in DecodeHookExec and overall is accounting for a very large portion of the overall CPU time.

So what does typedDecodeHook do? Very little in fact. In essence its just checking if the passed in value is convertible to one of two interfaces, converts it and returns the converted value. This “conversion” is just checking that a provided function pointer has the right signature and “converts” it to the named function type.

The answer here is that the reflection base checking if the fn pointers are convertible and then converting them is not cheap and we are doing it way more often than we need to. For every field in every request being decoded by the HTTP API we will rerun the exact same conversion routines repeatedly. They will always produce the same result, yet we keep rerunning them. In some cases a Txn request could contain many operations to be performed. Each one of those operations will have at least 4 fields internally. So we will do the 3 conversion

5X + 1 times where X is the number of operations in the txn (so we are calling typedDecodeHook at least 16 times for every Txn op). While the HTTP API enforces a max of 64 ops in a Txn request, that is enforced AFTER decoding so it could be possible to send huge requests that have no chance of actually being applied but they can easily consume very large amounts of CPU.

@pearkes pearkes added type/enhancement Proposed improvement or new feature theme/performance Performance benchmarking or potential improvement labels Jul 16, 2019
@pearkes pearkes added this to the Upcoming milestone Jul 16, 2019
@s-mang s-mang removed their assignment Aug 1, 2019
@s-mang s-mang self-assigned this Aug 22, 2019
@s-mang
Copy link
Contributor

s-mang commented Aug 22, 2019

I'm going to try to paste some of the context I gain on this issue as it comes to me, in case I drop off or someone else has ideas they can chime in.

Wrote a more strenuous benchmark for mapstructure, which decodes a semi-complex, 75KB json blob into a Go struct. Thought the new benchmark alone, especially compared to the output of existing benchmarks, was note worthy (new more strenuous benchmark at the bottom) 🏒 :

Sarahs-MacBook-Pro:mapstructure sadams$ go test -bench=. -benchmem -memprofile memprofile.out -cpuprofile cpuprofile.out
goos: darwin
goarch: amd64
pkg: github.com/mitchellh/mapstructure
Benchmark_Decode-8                   	  300000	      4342 ns/op	    1993 B/op	      51 allocs/op
Benchmark_DecodeViaJSON-8            	  500000	      3998 ns/op	    1232 B/op	      34 allocs/op
Benchmark_JSONUnmarshal-8            	 1000000	      1881 ns/op	     288 B/op	      13 allocs/op
Benchmark_DecodeBasic-8              	  200000	      9671 ns/op	    6327 B/op	      76 allocs/op
Benchmark_DecodeEmbedded-8           	  200000	      6396 ns/op	    5827 B/op	      75 allocs/op
Benchmark_DecodeTypeConversion-8     	   50000	     31449 ns/op	   16402 B/op	     240 allocs/op
Benchmark_DecodeMap-8                	  500000	      3057 ns/op	    1208 B/op	      34 allocs/op
Benchmark_DecodeMapOfStruct-8        	  100000	     12206 ns/op	   11031 B/op	     139 allocs/op
Benchmark_DecodeSlice-8              	 1000000	      2333 ns/op	     944 B/op	      29 allocs/op
Benchmark_DecodeSliceOfStruct-8      	  200000	     11264 ns/op	   10430 B/op	     128 allocs/op
Benchmark_DecodeWeaklyTypedInput-8   	  500000	      2728 ns/op	    1464 B/op	      28 allocs/op
Benchmark_DecodeMetadata-8           	  500000	      2762 ns/op	    1707 B/op	      26 allocs/op
Benchmark_DecodeMetadataEmbedded-8   	  300000	      6242 ns/op	    5276 B/op	      62 allocs/op
Benchmark_DecodeTagged-8             	 1000000	      1601 ns/op	     840 B/op	      18 allocs/op
Benchmark_DecodeLargeJson-8          	     500	   2893028 ns/op	 1373138 B/op	   33971 allocs/op

(gist of the benchmark is here just in case: https://gist.github.com/adams-sarah/44cb61871a1057da8fde05b059c0478a)

@s-mang
Copy link
Contributor

s-mang commented Aug 23, 2019

Alright I've got some more benchmarks and some charts. I decided to compare the growth of these benchmarks against similar benchmarks that I wrote for encoding/json.

(RED = mitchellh/mapstructure, BLUE = encoding/json)
Screen Shot 2019-08-23 at 12 29 47 PM

The last 2 benchmarks where you see the growth are decoding 75kb json and 1mb json blobs respectively.

This is pretty much what I expected given that mapstructure uses reflect, and uses it recursively. Using reflect recursively does not scale well in my experience, especially in terms of memory usage.

I'm worried about the use of this package, these benchmarks are an initial red flag in my head. But I'm going to dive into some application profiles next. Perhaps this is not the underlying issue. We shall see.

Benchmark values pasted here:

encoding/json

Benchmark_Sarah_Decode-8                	 1000000	      2213 ns/op	     320 B/op	      14 allocs/op
Benchmark_Sarah_DecodeViaJSON-8         	 1000000	      1781 ns/op	     656 B/op	       5 allocs/op
Benchmark_Sarah_DecodeBasic-8           	  500000	      3259 ns/op	     368 B/op	      13 allocs/op
Benchmark_Sarah_DecodeMapOfStruct-8     	 1000000	      2111 ns/op	     648 B/op	      16 allocs/op
Benchmark_Sarah_DecodeSlice-8           	 1000000	      1122 ns/op	     232 B/op	       8 allocs/op
Benchmark_Sarah_DecodeSliceOfStruct-8   	 1000000	      1388 ns/op	     288 B/op	       8 allocs/op
Benchmark_Sarah_DecodeTagged-8          	 3000000	       471 ns/op	     184 B/op	       2 allocs/op
Benchmark_Sarah_DecodeLargeJson75kb-8   	    2000	   1026076 ns/op	  104284 B/op	    7335 allocs/op
Benchmark_Sarah_DecodeLargeJson1mb-8    	     100	  14790997 ns/op	 2039177 B/op	  135597 allocs/op


mitchellh/mapstructure

Benchmark_Decode-8                   	  300000	      4670 ns/op	    1993 B/op	      51 allocs/op
Benchmark_DecodeViaJSON-8            	  300000	      4222 ns/op	    1232 B/op	      34 allocs/op
Benchmark_DecodeBasic-8              	  200000	     10160 ns/op	    6327 B/op	      76 allocs/op
Benchmark_DecodeMapOfStruct-8        	  100000	     12954 ns/op	   11031 B/op	     139 allocs/op
Benchmark_DecodeSlice-8              	  500000	      2432 ns/op	     944 B/op	      29 allocs/op
Benchmark_DecodeSliceOfStruct-8      	  100000	     12137 ns/op	   10431 B/op	     128 allocs/op
Benchmark_DecodeTagged-8             	 1000000	      1655 ns/op	     840 B/op	      18 allocs/op
Benchmark_DecodeLargeJson75kb-8      	     500	   2983173 ns/op	 1373150 B/op	   33971 allocs/op
Benchmark_DecodeLargeJson1mb-8       	     100	  37533688 ns/op	19567845 B/op	  460321 allocs/op

@s-mang
Copy link
Contributor

s-mang commented Aug 23, 2019

Attaching a screenshot of the cpu flame graph mentioned in the issue description. Or, rather, an instance of one.
Screen Shot 2019-08-23 at 1 58 04 PM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/performance Performance benchmarking or potential improvement type/enhancement Proposed improvement or new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants