-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simulation: Add benchmarking #2187
Conversation
Codecov Report
@@ Coverage Diff @@
## develop #2187 +/- ##
=========================================
Coverage ? 64.2%
=========================================
Files ? 140
Lines ? 8579
Branches ? 0
=========================================
Hits ? 5508
Misses ? 2694
Partials ? 377 |
Interestingly, tx processing is only 2% of the entire time here. (No IAVL writes, no signature verification), the rest of the time is just slashing logic in EndBlock, and most of the hold up is in amino unmarshalling. (This translates to about 5.25 out of 7 seconds spent on amino unmarshalling, with only 453 msgs, and 10 blocks. This is a lot of time spent on unmarshalling) I highly recommend looking through a profilers view of this.
|
ctx, write := ctx.CacheContext() | ||
result := gov.NewHandler(k)(ctx, msg) | ||
result := handler(ctx, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched from creating a new handler on each message, just to help avoid skewing memory write stats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
DisplayEvents(events) | ||
} | ||
|
||
func updateLog(testingmode bool, log string, update string, args ...interface{}) (updatedLog string) { | ||
if testingmode == true { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
== true
can be dropped
Use
to profile with GoLevelDB and commits. Even with both of these, amino dominates the runtime. (65%!!) Note that most of these calls are in keeper.UpdateValidator, perhaps we can reduce the number of state reads there. |
Also I just want to note, that the time signature verification is going to add is neglible. It will add .5ms per operation, which translates to 25ms, which is nothing compared to the time spent in amino. |
I looked into this a bit more, alot of the problem is that in the simple 10 block example we have, there are 2258 calls to the governance slash function. We should optimize the usage of governance slashing, and the slash functions amino calls. Alternatively we could just cache the We also should prioritize getting future ops implemented so we can have voting proceed properly. However I am slightly suspicious about the governance req'd number of blocks. The proposals should be lasting longer than this. For referrence though, there were 160876 calls to the keeper.GetValidator function (not even keeper.GetValidatorByPubkey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be better separated, but since it's not modifying the state machine it's a bit less critical. See comments.
cmd/gaia/app/sim_test.go
Outdated
blockSize int | ||
enabled bool | ||
verbose bool | ||
usegoleveldb bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase please
cmd/gaia/app/sim_test.go
Outdated
@@ -36,6 +39,8 @@ func init() { | |||
flag.IntVar(&blockSize, "SimulationBlockSize", 200, "Operations per block") | |||
flag.BoolVar(&enabled, "SimulationEnabled", false, "Enable the simulation") | |||
flag.BoolVar(&verbose, "SimulationVerbose", false, "Verbose log output") | |||
flag.BoolVar(&usegoleveldb, "SimulationGoLevelDB", false, "Use GoLevelDB instead of memdb") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we just always do this with the benchmark? Looks like this flag doesn't apply to non-benchmarking simulation anyways
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
ctx, write := ctx.CacheContext() | ||
result := gov.NewHandler(k)(ctx, msg) | ||
result := handler(ctx, msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
|
||
// Log the header time for future lookup | ||
pastTimes = append(pastTimes, header.Time) | ||
if !testingmode { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
camelCase
evidence := make([]abci.Evidence, 0) | ||
for r.Float64() < evidenceFraction { | ||
height := header.Height | ||
time := header.Time | ||
if r.Float64() < pastEvidenceFraction { | ||
height = int64(r.Intn(int(header.Height))) | ||
time = pastTimes[height] | ||
time = lastHeaderTime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain this change? Isn't this supposed to be the timestamp when the infraction was committed (which was the previous header)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
your right, my bad. I misread height as header.Height, not sure how I missed the random.
I actually think we should consider removing gocyclo. I think it does more harm then good. To fix gocyclo errors, I had to refactor some of the code, which made this PR hard to review unless you were going commit by commit. |
a43a7f8
to
5643c08
Compare
In simulation, this was shown to cause a 4x speedup. There are no safety concerns here, as amino encoding is deterministic.
After properly initializing governance proposal time (Thanks Sunny!) the simulator can quickly get to much larger block sizes. (Currently set to 210 blocks, this has 10 blocks of governance slashing) This now shows that GetValidator, and the way we do sorting in the iterators are bottlenecks. The PR for GetValidator still eliminates that part of the time, which is quite significant. Jae and I discussed a bit about what the cause of the iterator being slow is, and how to speed it up, I'll write that up into another issue. (However we could make that postlaunch, as its probably not going to be a simple change like the get validator change was) |
Improve GetValidator speed
simulation: display db size at end of simulation, add makefile entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
utACK
This PR adds simulation to benchmarking.
It is slightly suboptimal of an implementation, as it involves alot of code duplication. However we do want to avoid invariant checking or logging (as logs are significant time increase), amongst other things. I think including this as a separate function is preferable to a single mega function. Once we implement something like an "OperationContext" I will combine these two functions more. (That will make it easier to do cleanly)Doing this without code duplication was super easy, not sure what I was thinking when I wrote the above.
Sample benchmark output:
Running this through a profiler, we see that amino unmarshalling takes 75% of the time!!! We should consider using protobuf where applicable, as their implementation has been optimized to enable inlining. (They've fought the golang compiler)
In the process I discovered our current "test validator existswhen deleting it" is broken. I will open a seperate PR to demonstrate that, this PR should be good merge.
Targeted PR against correct branch (see CONTRIBUTING.md)
Wrote tests
Added entries in
PENDING.md
with issue #rereviewed
Files changed
in the github PR explorerFor Admin Use: