Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

performance: Agreement state serialization using msgp #4644

Merged
merged 48 commits into from
Oct 31, 2022

Conversation

iansuvak
Copy link
Contributor

@iansuvak iansuvak commented Oct 14, 2022

Summary

This change will replace agreement state serialization used for crash recovery purposes from using reflection to using auto-generated code from the msgp library. It currently supports router and player structs, but not actions since they are polymorphic.

Currently I'm using unbound allocbounds since this isn't exposed to network and isn't an attack vector but happy to take suggestions on

TODO

Implement handling for messageHandle field which is of type interface{} and is present on messageEvents
Handle randomized generation for messageHandle which is of interface{} type and isn't currently supported.
Above weren't implemented. messageHandle field was explicitly unexported instead.

  • Either add allocbounds or handle codec_tester warnings

Test Plan

  • Pass autogenerated tests in codec_tester.go that compare encodings/decodings of randomly generated objects using msgp and go-codec
  • Benchmark test on randomized data shows improvement over the current reflection
  • Do manual test of starting a node, getting the state serialized to crash.db using the old encoder, shutting it down and successfully deserializing using the new code.

Testing done

New and existing tests pass
Benchmarks show improvement using both new randomized encoding benchmark as well as using a crash.db collected from a testnet node instrumented to insert new rows for each persist call instead of overwriting it.

Result stats over 20 benchmark tests of decode:
Screen Shot 2022-10-20 at 3 57 00 PM

Same data created using master branch (reflection) was successfully decoded and reencoded using both libraries and shown to be equal. Test not included since it depends on having access to a binary DB file which shouldn't be committed to the repo:
image

Tests over Scenario1 with tps-multiplier of 6 show the following improvements on pprof CPU flamegraph:
Reflect CPU
Encode: 3.19% 12.52s
Decode: 5.06% 19.88s
msgp CPU
Encode: 1.36% 2.26s
Decode: 0.93% 1.53s

Reflect alloc objects
Encode: 3.91% 36323960
Decode: 4.07% 37835267

msgp alloc objects
Encode: 0.002% 43941
Decode: 0.67% 14930739

Performance Testing post removing omitempty

Result stats over 20 benchmark tests of decode:
image

Msgp time stayed about the same while reflect time went up a lot with removal of omitempty. Scenario1 testing is similar to above

@codecov
Copy link

codecov bot commented Oct 17, 2022

Codecov Report

Merging #4644 (fa48132) into master (7666348) will increase coverage by 0.05%.
The diff coverage is 83.49%.

@@            Coverage Diff             @@
##           master    #4644      +/-   ##
==========================================
+ Coverage   54.42%   54.47%   +0.05%     
==========================================
  Files         403      404       +1     
  Lines       51931    51994      +63     
==========================================
+ Hits        28264    28325      +61     
- Misses      21288    21292       +4     
+ Partials     2379     2377       -2     
Impacted Files Coverage Δ
agreement/actiontype_string.go 33.33% <0.00%> (ø)
agreement/asyncVoteVerifier.go 89.83% <ø> (ø)
agreement/cryptoVerifier.go 67.60% <ø> (ø)
agreement/events.go 61.62% <ø> (ø)
agreement/eventtype_string.go 33.33% <0.00%> (ø)
agreement/message.go 66.66% <ø> (ø)
agreement/proposalManager.go 96.07% <ø> (ø)
agreement/proposalStore.go 100.00% <ø> (ø)
agreement/proposalTracker.go 95.16% <ø> (ø)
agreement/proposalTrackerContract.go 40.84% <ø> (ø)
... and 22 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@iansuvak iansuvak changed the title WIP: perf: Agreement state serialization using msgp perf: Agreement state serialization using msgp Oct 19, 2022
@iansuvak iansuvak marked this pull request as ready for review October 20, 2022 20:16
Copy link
Contributor

@brianolson brianolson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I skimmed the whole thing, there's a lot. One oddity popped out to me.

algorandskiy
algorandskiy previously approved these changes Oct 31, 2022
@@ -43,7 +43,9 @@ type event interface {
// A ConsensusVersionView is a view of the consensus version as read from a
// LedgerReader, associated with some round.
type ConsensusVersionView struct {
Err serializableError
_struct struct{} `codec:","`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does the comma do?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing other than telling msgp to generate marshaller methods for it. In our fork we treat omitempty as the logical default so you have to include the tag that explicitly leaves out omitempty it if you don't want the behavior.

Nothing before the comma means that field isn't renamed and nothing after it means no special directives like omitempty and others

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so it can't be empty I take it, or no tag at all ... OK

@cce cce requested a review from yossigi October 31, 2022 17:25
Copy link
Contributor

@jannotti jannotti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems good, nice testing of before and after.

I notice we rarely (never?) use omitempty. Is that in order to make the encodings the same before and after? I wonder how much space it would save and whether it's worth turning it on once we're confident enough that we don't care about the before/after tests.

@@ -43,7 +43,9 @@ type event interface {
// A ConsensusVersionView is a view of the consensus version as read from a
// LedgerReader, associated with some round.
type ConsensusVersionView struct {
Err serializableError
_struct struct{} `codec:","`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not omitempty? Does this serialize the (usually nil, I assume) Err as a 0 length string?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We aren't using omitempty to make encodings as close as possible to what they were before. The one exception to where they wouldn't be identical is message struct since we are using msgp.Raw there.

The main reason @cce asked me to remove the omitempty hints is for the structs that contain maps since we could accidentally try to insert into a nil map at runtime which would cause a panic.

I'm happy to add omitemptys on all structs that don't export maps if we want to do that.

I wasn't planning on removing them since there is no noticeable CPU impact as I mentioned and the persistence.go implementation only ever stores a single row in the DB and with WAL enabled on the DB the small amount of IO increase here shouldn't be a concern either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea was (and @iansuvak this is how it works right?) that the old reflection code will encode and decode a nil map differently from an empty map (right?) and if we use omitempty, the msgp-generated code will keep encoding and decoding this the same way.

} else {
s.Router = protocol.Encode(&rr)
s.Player = protocol.Encode(&p)
}
s.Clock = t.Encode()
for _, act := range a {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we preallocate s.ActionTypes and s.Actions to be len(a)?


func TestRandomizedEncodingFullDiskState(t *testing.T) {
partitiontest.PartitionTest(t)
router, player := randomizeDiskState()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

usually randomized tests run a bunch of iterations, maybe wrap this up a loop and make 10K iterations or something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did 5k since 10k was taking ~ 20s. I would be pro removing this or shrinking it down post-consensus release

algorandskiy
algorandskiy previously approved these changes Oct 31, 2022
@@ -82,37 +82,41 @@ type (
Quit()
}

//msgp:ignore cryptoVoteRequest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

out of curiosity, how did you figure out which types to ignore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anything that embeds message really and is not messageEvent. The process of adding the fields was adding the _struct field to the top levels embedded things and then seeing which structs didn't get their interfaces implemented after running make msgp. anything that embeds message got accidentally added and ignoring them didn't introduce any unimplemented method errors.

@@ -270,6 +277,10 @@ func randomizeValue(v reflect.Value, datapath string, tag string, remainingChang
// unexported
continue
}
if st.Name() == "messageEvent" && f.Name == "Tail" {
// Don't try and set the Tail field since it's recursive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, you could have set just one Tail, but this is fine

Copy link
Contributor

@cce cce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try using the "autopsy" tool to analyze the new format of cadaver files?

@iansuvak
Copy link
Contributor Author

iansuvak commented Nov 3, 2022

No I didn't touch cadavers with this change. We should take another look at them at some point though since we are still using them on significant portion of stake

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants