Switch Thrift with Jaeger's fork #3050

jpkrohling · 2021-06-03T08:25:33Z

Closes #2638 by using the Jaeger fork of Thrift, which contains the unreleased fix to the memory consumption issue that affects the Jaeger Agent.
Once Apache Thrift 0.15.0 or 0.14.2 is released, the replace directive should be removed.

Signed-off-by: Juraci Paixão Kröhling juraci@kroehling.de

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling · 2021-06-03T08:27:45Z

go.mod

@@ -73,4 +73,4 @@ require (
 	honnef.co/go/tools v0.1.4
 )

-replace github.com/gogo/protobuf => github.com/gogo/protobuf v1.3.2


The protobuf dependency has been removed, as the one used in the main section is the same.

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling · 2021-06-03T10:36:14Z

cmd/collector/app/zipkin/http_handler_test.go

@@ -225,7 +226,7 @@ func TestFormatBadBody(t *testing.T) {
 	statusCode, resBodyStr, err := postBytes(server.URL+`/api/v1/spans`, []byte("not good"), createHeader("application/x-thrift"))
 	assert.NoError(t, err)
 	assert.EqualValues(t, http.StatusBadRequest, statusCode)
-	assert.EqualValues(t, "Unable to process request body: Unknown data type 111\n", resBodyStr)


This change was interesting, as the new error is this: Unable to process request body: size exceeded max allowed: 1869881447. This seems to indicate that the fix is being picked up, but it did raise a warning in my head: are we really allocating more than 1GiB for this? Apparently, no. I used runtime.MemStats to measure the memory consumption before and after the HTTP call, and the usage was minimal (2 Alloc vs. 3 Alloc).

This is indeed interesting.

The old error message "Unknown data type" comes from Skip function, which is called when either the field id is not defined in the struct, or when the type of the field id doesn't match what's defined in the struct.

The new error message "size exceeded max allowed" comes from size sanity checks, which could be called by a lot of TProtocol functions (for example, ReadString, ReadBinary, ReadMapBegin, ReadListBegin, ReadSetBegin). This means the new code actually passed the field type check and is no longer skipped (or maybe previously the first field happened to match and it's the second field failed the field type check and caused the error, while in the new code the first field passed field type check but failed the size sanity check).

runtime.MemStats might be misleading. I believe it's the same as benchmark tests' ReportAllocs, and this benchmark test reported 0 allocs/0 bytes which is certainly a lie (I have to use a much smaller size because the original one is too slow):

package main import ( "testing" ) // const size = 1869881447 const size = 1869 func BenchmarkAlloc(b *testing.B) { b.ReportAllocs() for i := 0; i < b.N; i++ { _ = make([]int, size) } }

$ go test -bench . -benchmem goos: linux goarch: amd64 pkg: foo cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz BenchmarkAlloc-12 1000000000 0.2487 ns/op 0 B/op 0 allocs/op PASS ok foo 0.278s

@fishy, do you think this change raises flags? I wouldn't expect the payload for this test to cause such a huge message.

do you think this change raises flags?

Not necessarily.

If you dig into the code on what this does, the first ever read from the thrift payload is to ReadListBegin, which is one of the functions that would trigger the new error.

Let's first convert the payload ([]byte("not good")) into raw bytes: [6e 6f 74 20 67 6f 6f 64].

In ReadListBegin implementation, first it reads a byte, 6e, then it tries to read an int32 as the size of the list. For the reading of the int32, it reads the next 4 bytes (6f 74 20 67), decode with bigendian, which results in 1869881447.

So this is totally expected behavior.

And following ReadListBegin, jaeger's code didn't really do the pre-allocation. it just append spans to the slice and rely on append to do the allocation and grow the slice.

y'all even already have a comment on that :)

jaeger/model/converter/thrift/zipkin/deserialize.go

Lines 51 to 52 in ff32436

// We don't depend on the size returned by ReadListBegin to preallocate the array because it

// sometimes returns a nil error on bad input and provides an unreasonably large int for size

codecov · 2021-06-03T10:39:30Z

Codecov Report

Merging #3050 (61d1790) into master (d052d34) will increase coverage by 0.03%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #3050      +/-   ##
==========================================
+ Coverage   96.00%   96.03%   +0.03%     
==========================================
  Files         229      229              
  Lines        9937     9937              
==========================================
+ Hits         9540     9543       +3     
+ Misses        327      325       -2     
+ Partials       70       69       -1

Impacted Files	Coverage Δ
pkg/config/tlscfg/cert_watcher.go	`92.20% <0.00%> (-2.60%)`	⬇️
cmd/query/app/static_handler.go	`96.77% <0.00%> (ø)`
plugin/storage/integration/integration.go	`77.90% <0.00%> (+0.55%)`	⬆️
...lugin/sampling/strategystore/adaptive/processor.go	`100.00% <0.00%> (+0.92%)`	⬆️
cmd/query/app/server.go	`97.08% <0.00%> (+1.45%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d052d34...61d1790. Read the comment docs.

go.mod

cmd/collector/app/zipkin/http_handler_test.go

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

Switched Thrift with Jaeger's fork

056942b

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling requested a review from a team as a code owner June 3, 2021 08:25

jpkrohling requested a review from albertteoh June 3, 2021 08:25

jpkrohling commented Jun 3, 2021

View reviewed changes

Changed assertion on error

af78a7e

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling commented Jun 3, 2021

View reviewed changes

jpkrohling changed the title ~~Switched Thrift with Jaeger's fork~~ Switch Thrift with Jaeger's fork Jun 3, 2021

pavolloffay previously approved these changes Jun 3, 2021

View reviewed changes

go.mod Show resolved Hide resolved

cmd/collector/app/zipkin/http_handler_test.go Outdated Show resolved Hide resolved

Changes based on the review

700c95d

Signed-off-by: Juraci Paixão Kröhling <juraci@kroehling.de>

jpkrohling dismissed pavolloffay’s stale review via 700c95d June 3, 2021 14:33

yurishkuro approved these changes Jun 3, 2021

View reviewed changes

yurishkuro enabled auto-merge (squash) June 3, 2021 14:55

Merge branch 'master' into jpkrohling/patched-thrift

61d1790

yurishkuro merged commit 394ec23 into jaegertracing:master Jun 3, 2021

jpkrohling added this to the Release 1.23.0 milestone Jun 4, 2021

jpkrohling deleted the jpkrohling/patched-thrift branch July 28, 2021 19:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch Thrift with Jaeger's fork #3050

Switch Thrift with Jaeger's fork #3050

jpkrohling commented Jun 3, 2021

jpkrohling Jun 3, 2021

jpkrohling Jun 3, 2021

fishy Jun 3, 2021 •

edited

Loading

jpkrohling Jun 4, 2021

fishy Jun 4, 2021 •

edited

Loading

fishy Jun 4, 2021

fishy Jun 4, 2021

codecov bot commented Jun 3, 2021 •

edited

Loading

	// We don't depend on the size returned by ReadListBegin to preallocate the array because it
	// sometimes returns a nil error on bad input and provides an unreasonably large int for size

Switch Thrift with Jaeger's fork #3050

Switch Thrift with Jaeger's fork #3050

Conversation

jpkrohling commented Jun 3, 2021

jpkrohling Jun 3, 2021

Choose a reason for hiding this comment

jpkrohling Jun 3, 2021

Choose a reason for hiding this comment

fishy Jun 3, 2021 • edited Loading

Choose a reason for hiding this comment

jpkrohling Jun 4, 2021

Choose a reason for hiding this comment

fishy Jun 4, 2021 • edited Loading

Choose a reason for hiding this comment

fishy Jun 4, 2021

Choose a reason for hiding this comment

fishy Jun 4, 2021

Choose a reason for hiding this comment

codecov bot commented Jun 3, 2021 • edited Loading

Codecov Report

fishy Jun 3, 2021 •

edited

Loading

fishy Jun 4, 2021 •

edited

Loading

codecov bot commented Jun 3, 2021 •

edited

Loading