Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of XDR codec #2689

Closed
2opremio opened this issue Jun 11, 2020 · 4 comments
Closed

Improve performance of XDR codec #2689

2opremio opened this issue Jun 11, 2020 · 4 comments
Labels
performance issues aimed at improving performance

Comments

@2opremio
Copy link
Contributor

As we can see in #2552 , the performance encoding and decoding performance of the current codec (https://github.com/stellar/go-xdr) could be greatly improved by:

  1. Avoiding reflection (i.e. pre-generating the codec)
  2. Reducing allocations

Horizon would greatly benefit from this.

As an alternative, there is https://github.com/xdrpp/goxdr (and https://github.com/calmh/xdr , but it's unmaintained )

@ire-and-curses
Copy link
Member

Moving to xdrpp is a medium term thing we would like to do. Would be great to evaluate to what extent that can be done while minimising API breakage.

@bartekn bartekn added the performance issues aimed at improving performance label Nov 10, 2020
@leighmcculloch
Copy link
Member

leighmcculloch commented Sep 20, 2021

I'm looking into improving the xdr package, and Go code generated by xdrgen, for stellar-deprecated/starlight#318. Specifically, I'm looking at replacing the reflection based marshal/unmarshal.

Out of curiosity I did some quick benchmarks of the xdr and gxdr packages in this repo that are generated respectively by xdrgen and goxdr. My hope was to get a sense of how removing reflection might benefit the xdr package.

However, after benchmarking it is not clear to me that switching to goxdr will yield immediately better results and the design of the code generated by goxdr must impose other changes affecting performance negating the benefits of not using reflection.

A quick benchmark using an average sized transaction from pubnet showed that our existing xdr package is faster, and produces less allocations, when unmarshaling even though it uses reflection. The gxdr package is a little faster and produces less allocations when marshaling.

This definitely needs more analysis and with some profiling there might be some low hanging fruit that makes either implementation much better.

$ go test -run=- -benchmem -bench=.
goos: darwin
goarch: amd64
pkg: github.com/stellar/go
cpu: Intel(R) Core(TM) i7-8569U CPU @ 2.80GHz
BenchmarkXDRUnmarshal-8           119038              9605 ns/op            3408 B/op        143 allocs/op
BenchmarkGXDRUnmarshal-8           66351             15661 ns/op           54153 B/op        170 allocs/op
BenchmarkXDRMarshal-8             189118              6288 ns/op            3184 B/op        113 allocs/op
BenchmarkGXDRMarshal-8            291147              4159 ns/op            1280 B/op         93 allocs/op
PASS
ok      github.com/stellar/go   7.068s
benchmark_test.go testing xdr (xdrgen) vs gxdr (goxdr)
package benchmarks

import (
	"bytes"
	"encoding/base64"
	"testing"

	"github.com/stellar/go/gxdr"
	"github.com/stellar/go/xdr"
	"github.com/stretchr/testify/require"
	goxdr "github.com/xdrpp/goxdr/xdr"
)

const input64 = "AAAAAgAAAADy2f6v1nv9lXdvl5iZvWKywlPQYsZ1JGmmAfewflnbUAAABLACG4bdAADOYQAAAAEAAAAAAAAAAAAAAABhSLZ9AAAAAAAAAAEAAAABAAAAAF8wDgs7+R5R2uftMvvhHliZOyhZOQWsWr18/Fu6S+g0AAAAAwAAAAJHRE9HRQAAAAAAAAAAAAAAUwsPRQlK+jECWsJLURlsP0qsbA/aIaB/z50U79VSRYsAAAAAAAAAAAAAAYMAAA5xAvrwgAAAAAAAAAAAAAAAAAAAAAJ+WdtQAAAAQCTonAxUHyuVsmaSeGYuVsGRXgxs+wXvKgSa+dapZWN4U9sxGPuApjiv/UWb47SwuFQ+q40bfkPYT1Tff4RfLQe6S+g0AAAAQBlFjwF/wpGr+DWbjCyuolgM1VP/e4ubfUlVnDAdFjJUIIzVakZcr5omRSnr7ClrwEoPj49h+vcLusagC4xFJgg="

var input = func() []byte {
	input, err := base64.StdEncoding.DecodeString(input64)
	if err != nil {
		panic(err)
	}
	return input
}()

func BenchmarkXDRUnmarshal(b *testing.B) {
	te := xdr.TransactionEnvelope{}

	// Make sure the input is valid.
	err := te.UnmarshalBinary(input)
	require.NoError(b, err)

	// Benchmark.
	for i := 0; i < b.N; i++ {
		_ = te.UnmarshalBinary(input)
	}
}

func BenchmarkGXDRUnmarshal(b *testing.B) {
	te := gxdr.TransactionEnvelope{}

	// Make sure the input is valid, note goxdr will panic if there's a
	// marshaling error.
	te.XdrMarshal(&goxdr.XdrIn{In: bytes.NewReader(input)}, "")

	// Benchmark.
	r := bytes.NewReader(input)
	for i := 0; i < b.N; i++ {
		r.Reset(input)
		te.XdrMarshal(&goxdr.XdrIn{In: r}, "")
	}
}

func BenchmarkXDRMarshal(b *testing.B) {
	te := xdr.TransactionEnvelope{}

	// Make sure the input is valid.
	err := te.UnmarshalBinary(input)
	require.NoError(b, err)
	output, err := te.MarshalBinary()
	require.NoError(b, err)
	require.Equal(b, input, output)

	// Benchmark.
	for i := 0; i < b.N; i++ {
		_, _ = te.MarshalBinary()
	}
}

func BenchmarkGXDRMarshal(b *testing.B) {
	te := gxdr.TransactionEnvelope{}

	// Make sure the input is valid, note goxdr will panic if there's a
	// marshaling error.
	te.XdrMarshal(&goxdr.XdrIn{In: bytes.NewReader(input)}, "")
	output := bytes.Buffer{}
	te.XdrMarshal(&goxdr.XdrOut{Out: &output}, "")

	// Benchmark.
	for i := 0; i < b.N; i++ {
		output.Reset()
		te.XdrMarshal(&goxdr.XdrOut{Out: &output}, "")
	}
}

@2opremio
Copy link
Contributor Author

2opremio commented Dec 20, 2021

For context, after all the improvements we've got:

goos: darwin
goarch: amd64
pkg: github.com/stellar/go/benchmarks
cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkXDRUnmarshalWithReflection
BenchmarkXDRUnmarshalWithReflection-8                 	   74677	     16845 ns/op	    5528 B/op	     185 allocs/op
BenchmarkXDRUnmarshal
BenchmarkXDRUnmarshal-8                               	  657950	      1615 ns/op	    1344 B/op	      17 allocs/op
BenchmarkGXDRUnmarshal
BenchmarkGXDRUnmarshal-8                              	   36546	     29646 ns/op	   86736 B/op	     278 allocs/op
BenchmarkXDRMarshalWithReflection
BenchmarkXDRMarshalWithReflection-8                   	  100500	     11342 ns/op	    5128 B/op	     156 allocs/op
BenchmarkXDRMarshal
BenchmarkXDRMarshal-8                                 	 1213089	      1007 ns/op	    1208 B/op	       6 allocs/op
BenchmarkXDRMarshalWithEncodingBuffer
BenchmarkXDRMarshalWithEncodingBuffer-8               	 2153434	       554.8 ns/op	      32 B/op	       1 allocs/op
BenchmarkGXDRMarshal
BenchmarkGXDRMarshal-8                                	  147602	      7548 ns/op	    2152 B/op	     157 allocs/op
BenchmarkXDRMarshalHex
BenchmarkXDRMarshalHex-8                              	  520485	      2040 ns/op	    3496 B/op	      11 allocs/op
BenchmarkXDRMarshalHexWithEncodingBuffer
BenchmarkXDRMarshalHexWithEncodingBuffer-8            	  906595	      1244 ns/op	     928 B/op	       2 allocs/op
BenchmarkXDRUnsafeMarshalHexWithEncodingBuffer
BenchmarkXDRUnsafeMarshalHexWithEncodingBuffer-8      	 1221874	       984.4 ns/op	      32 B/op	       1 allocs/op
BenchmarkXDRMarshalBase64
BenchmarkXDRMarshalBase64-8                           	  572611	      1901 ns/op	    2856 B/op	      11 allocs/op
BenchmarkXDRMarshalBase64WithEncodingBuffer
BenchmarkXDRMarshalBase64WithEncodingBuffer-8         	 1000000	      1130 ns/op	     608 B/op	       2 allocs/op
BenchmarkXDRUnsafeMarshalBase64WithEncodingBuffer
BenchmarkXDRUnsafeMarshalBase64WithEncodingBuffer-8   	 1284129	       944.5 ns/op	      32 B/op	       1 allocs/op
BenchmarkXDRMarshalCompress
BenchmarkXDRMarshalCompress-8                         	 8519488	       131.1 ns/op	       0 B/op	       0 allocs/op
PASS

For marshalling (old = BenchmarkXDRUnmarshalWithReflection, new = BenchmarkXDRMarshalWithEncodingBuffer) , we have about a 30x speedup, 172x less memory allocated, 185x less allocations

For unmarhsalling (old = BenchmarkXDRMarshalWithReflection-8 , new = BenchmarkXDRMarshalWithReflection-8 ), we have about a 11x speedup, 5x less memory allocated, 26x less allocations

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance issues aimed at improving performance
Projects
None yet
Development

No branches or pull requests

4 participants