-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance: Agreement state serialization using msgp #4644
Conversation
But I also broke consistent sorting for ledger and others
All tests passing assuming that messageHandle doesn't need to be serialized
2222265
to
e7cfbec
Compare
Codecov Report
@@ Coverage Diff @@
## master #4644 +/- ##
==========================================
+ Coverage 54.42% 54.47% +0.05%
==========================================
Files 403 404 +1
Lines 51931 51994 +63
==========================================
+ Hits 28264 28325 +61
- Misses 21288 21292 +4
+ Partials 2379 2377 -2
📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I skimmed the whole thing, there's a lot. One oddity popped out to me.
@@ -43,7 +43,9 @@ type event interface { | |||
// A ConsensusVersionView is a view of the consensus version as read from a | |||
// LedgerReader, associated with some round. | |||
type ConsensusVersionView struct { | |||
Err serializableError | |||
_struct struct{} `codec:","` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does the comma do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing other than telling msgp to generate marshaller methods for it. In our fork we treat omitempty
as the logical default so you have to include the tag that explicitly leaves out omitempty it if you don't want the behavior.
Nothing before the comma means that field isn't renamed and nothing after it means no special directives like omitempty
and others
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so it can't be empty I take it, or no tag at all ... OK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems good, nice testing of before and after.
I notice we rarely (never?) use omitempty. Is that in order to make the encodings the same before and after? I wonder how much space it would save and whether it's worth turning it on once we're confident enough that we don't care about the before/after tests.
@@ -43,7 +43,9 @@ type event interface { | |||
// A ConsensusVersionView is a view of the consensus version as read from a | |||
// LedgerReader, associated with some round. | |||
type ConsensusVersionView struct { | |||
Err serializableError | |||
_struct struct{} `codec:","` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not omitempty? Does this serialize the (usually nil, I assume) Err
as a 0 length string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We aren't using omitempty to make encodings as close as possible to what they were before. The one exception to where they wouldn't be identical is message
struct since we are using msgp.Raw
there.
The main reason @cce asked me to remove the omitempty
hints is for the structs that contain maps since we could accidentally try to insert into a nil map at runtime which would cause a panic.
I'm happy to add omitemptys on all structs that don't export maps if we want to do that.
I wasn't planning on removing them since there is no noticeable CPU impact as I mentioned and the persistence.go
implementation only ever stores a single row in the DB and with WAL enabled on the DB the small amount of IO increase here shouldn't be a concern either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was (and @iansuvak this is how it works right?) that the old reflection code will encode and decode a nil map differently from an empty map (right?) and if we use omitempty, the msgp-generated code will keep encoding and decoding this the same way.
agreement/persistence.go
Outdated
} else { | ||
s.Router = protocol.Encode(&rr) | ||
s.Player = protocol.Encode(&p) | ||
} | ||
s.Clock = t.Encode() | ||
for _, act := range a { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we preallocate s.ActionTypes
and s.Actions
to be len(a)
?
agreement/persistence_test.go
Outdated
|
||
func TestRandomizedEncodingFullDiskState(t *testing.T) { | ||
partitiontest.PartitionTest(t) | ||
router, player := randomizeDiskState() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
usually randomized tests run a bunch of iterations, maybe wrap this up a loop and make 10K iterations or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did 5k since 10k was taking ~ 20s. I would be pro removing this or shrinking it down post-consensus release
@@ -82,37 +82,41 @@ type ( | |||
Quit() | |||
} | |||
|
|||
//msgp:ignore cryptoVoteRequest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
out of curiosity, how did you figure out which types to ignore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anything that embeds message
really and is not messageEvent
. The process of adding the fields was adding the _struct
field to the top levels embedded things and then seeing which structs didn't get their interfaces implemented after running make msgp
. anything that embeds message
got accidentally added and ignoring them didn't introduce any unimplemented method errors.
@@ -270,6 +277,10 @@ func randomizeValue(v reflect.Value, datapath string, tag string, remainingChang | |||
// unexported | |||
continue | |||
} | |||
if st.Name() == "messageEvent" && f.Name == "Tail" { | |||
// Don't try and set the Tail field since it's recursive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, you could have set just one Tail, but this is fine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you try using the "autopsy" tool to analyze the new format of cadaver files?
No I didn't touch cadavers with this change. We should take another look at them at some point though since we are still using them on significant portion of stake |
Summary
This change will replace agreement state serialization used for crash recovery purposes from using reflection to using auto-generated code from the msgp library. It currently supports router and player structs, but not actions since they are polymorphic.
Currently I'm using unbound allocbounds since this isn't exposed to network and isn't an attack vector but happy to take suggestions on
TODO
Implement handling for messageHandle field which is of typeinterface{}
and is present onmessageEvent
sHandle randomized generation for messageHandle which is ofinterface{}
type and isn't currently supported.Above weren't implemented. messageHandle field was explicitly unexported instead.
Either add allocbounds or handle codec_tester warningsTest Plan
Pass autogenerated tests incodec_tester.go
that compare encodings/decodings of randomly generated objects usingmsgp
andgo-codec
Testing done
New and existing tests pass
Benchmarks show improvement using both new randomized encoding benchmark as well as using a crash.db collected from a testnet node instrumented to insert new rows for each persist call instead of overwriting it.
Result stats over 20 benchmark tests of decode:
data:image/s3,"s3://crabby-images/b24e3/b24e3bc79d89dc4a5bf1c103c6d42cea19c74584" alt="Screen Shot 2022-10-20 at 3 57 00 PM"
Same data created using master branch (reflection) was successfully decoded and reencoded using both libraries and shown to be equal. Test not included since it depends on having access to a binary DB file which shouldn't be committed to the repo:
data:image/s3,"s3://crabby-images/04c9f/04c9fdf4f764a6c32cb318f358fd20c561677e29" alt="image"
Tests over Scenario1 with tps-multiplier of 6 show the following improvements on pprof CPU flamegraph:
Reflect CPU
Encode: 3.19% 12.52s
Decode: 5.06% 19.88s
msgp CPU
Encode: 1.36% 2.26s
Decode: 0.93% 1.53s
Reflect alloc objects
Encode: 3.91% 36323960
Decode: 4.07% 37835267
msgp alloc objects
Encode: 0.002% 43941
Decode: 0.67% 14930739
Performance Testing post removing omitempty
Result stats over 20 benchmark tests of decode:
data:image/s3,"s3://crabby-images/cb2ec/cb2ec132b6636cb17a894589a0a24534e525c6cb" alt="image"
Msgp time stayed about the same while reflect time went up a lot with removal of omitempty. Scenario1 testing is similar to above