Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add benchmark test #62

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft

add benchmark test #62

wants to merge 3 commits into from

Conversation

akhilkumarpilli
Copy link
Contributor

@akhilkumarpilli akhilkumarpilli commented Mar 3, 2022

Initially, I did load testing on tracelistener with multiple traces and here are the stats:

Traces sent Time took
100 K 42 secs
1M 391 secs
10M 4535 secs

After that, I tried to do benchmarking testing and here is the result of it:

goos: linux
goarch: amd64
pkg: github.com/allinbits/tracelistener/tracelistener
cpu: Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz
BenchmarkTraceListener
BenchmarkTraceListener-4           25924             44233 ns/op            1899 B/op         27 allocs/op

Please run make benchmark-<sdk-version> to run benchmark test locally

if err != nil {
return err
}
trace := tracelistener.TraceOperation{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot use this object to marshal trace operations, you must use the (non-exported) traceOperationInter object.

This one is just a high-level representation of a trace, not the wire-level format the chain outputs.

Also, what key/value are you sending? What's the aim of this benchmark? Speed of processing or just ingestion speed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are just checking ingestion speed and not considering processing speed.

Operation: string(tracelistener.WriteOp),
Key: []byte{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xa},
Value: []byte{0x68, 0x65, 0x6c, 0x6c, 0x6f, 0xa},
BlockHeight: uint64(height),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving the same block height for all traces will never trigger a database write, since tracelistener will only write back to database when the block height changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am passing blockheight from benchmark function which will be dynamic and different.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right! 👯

}
}

func loadTest(height int, file string) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func loadTest(height int, file string) error {
func loadTest(b *testing.B, height int, file string) error {
b.Helper()

This function should be marked as helper, so that it gets ignored in performance counting.

}

func loadTest(height int, file string) error {
ff, err := fifo.OpenFifo(context.Background(), file, syscall.O_WRONLY, 0655)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not open and close the FIFO, just open it at the beginning of the test and close it when done.

This way it resembles more how real execution behaves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even just doing this improves dramatically the performance of the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I tried with opening file only once in the beginning. But, tracelistener is considering all traces into one single trace and just parsing it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check these logs

Fifo opening and closing each time: https://pastebin.com/22SSVnaq

Fifo opening only once in beginning; https://pastebin.com/RQjeRXSg

When fifo is opened in the beginning, new trace is being appended to previous trace everytime.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use fmt.Fprintf instead of writing directly, and append a newline at the end of each serialized trace.

@@ -56,6 +57,11 @@ $(TEST_VERSIONS):
go test -v -failfast -race -count=1 \
-tags $(shell echo $@ | sed -e 's/test-/sdk_/g' -e 's/-/_/g'),muslc \
./...

$(BENCHMARK_VERSIONS):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to enable CPU and memory profiling while running benchmarks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I am not wrong, -benchmem flag will give you the same right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have to supply flags like those as well to create the profiling files:

-cpuprofile=cpu.out -memprofile=mem.out -trace=trace.out

@gsora
Copy link
Contributor

gsora commented Mar 3, 2022

Hmm I got wildly different numbers.

The estimated execution time calculated by go test (mind you, with the suggested improvements) for the two biggest sizes are the following:

BenchmarkTraceListener1M-10      	       1	3498110750 ns/op	501296464 B/op	 6201905 allocs/op
BenchmarkTraceListener10M-10     	       1	41465110541 ns/op	5070649768 B/op	62435038 allocs/op

So for 10M traces processed, I have a runtime of about 41s.

I ran those tests on an ARM M1 Pro machine.

@gsora
Copy link
Contributor

gsora commented Mar 3, 2022

From a quick test, looks like the json.Marshal method is actually bottlenecking the whole thing... Incredible :|

@gsora
Copy link
Contributor

gsora commented Mar 3, 2022

Screenshot 2022-03-03 at 12 42 26

Memory-wise, we could greatly improve this fast path by relying on other JSON libraries, which avoid doing all those allocations I think.

@gsora
Copy link
Contributor

gsora commented Mar 3, 2022

Screenshot 2022-03-03 at 12 43 32

This is the CPU flame graph from the hot path in tracelistener ingestion routines. The JSON unmarshaling is killing us on the CPU side as well.

@akhilkumarpilli akhilkumarpilli marked this pull request as ready for review March 4, 2022 05:04
@akhilkumarpilli akhilkumarpilli marked this pull request as draft March 4, 2022 10:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants