Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelise data extension and pre-repair sanity check #116

Merged
merged 16 commits into from
Sep 20, 2022

Conversation

musalbas
Copy link
Member

@musalbas musalbas commented Sep 14, 2022

Notes:

  • although we implement some mutexes in dataSquare, we do not claim that dataSquare itself is now thread-safe, as we only added the minimal number of locks to make parallel extension work.
  • in all the following benchmarks, extension has been benchmarks without computing the row and column roots, which continues to be the primary bottleneck.
  • cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz

go-leopard + parallelisation

BenchmarkExtension/LeopardFF8_size_4-16         	   21063	     58295 ns/op
BenchmarkExtension/LeopardFF16_size_4-16        	   20356	     60095 ns/op
BenchmarkExtension/RSGF8_size_4-16              	   46251	     25855 ns/op
BenchmarkExtension/RSGF8_size_8-16              	   16380	     73586 ns/op
BenchmarkExtension/LeopardFF8_size_8-16         	    8278	    145906 ns/op
BenchmarkExtension/LeopardFF16_size_8-16        	    8599	    145683 ns/op
BenchmarkExtension/RSGF8_size_16-16             	    5766	    211363 ns/op
BenchmarkExtension/LeopardFF8_size_16-16        	    2546	    413941 ns/op
BenchmarkExtension/LeopardFF16_size_16-16       	    2623	    396130 ns/op
BenchmarkExtension/RSGF8_size_32-16             	    1718	    742880 ns/op
BenchmarkExtension/LeopardFF8_size_32-16        	     949	   1280569 ns/op
BenchmarkExtension/LeopardFF16_size_32-16       	     910	   1330538 ns/op
BenchmarkExtension/RSGF8_size_64-16             	     416	   2743251 ns/op
BenchmarkExtension/LeopardFF8_size_64-16        	     384	   3193578 ns/op
BenchmarkExtension/LeopardFF16_size_64-16       	     381	   3094233 ns/op
BenchmarkExtension/RSGF8_size_128-16            	      88	  13574625 ns/op
BenchmarkExtension/LeopardFF8_size_128-16       	     100	  10233559 ns/op
BenchmarkExtension/LeopardFF16_size_128-16      	     100	  10620378 ns/op

go-leopard + no parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	  117916	     11580 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	   15774	     92830 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	   15693	     90473 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   28026	     58276 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	    6709	    282830 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	    6716	    203269 ns/op
BenchmarkExtension/RSGF8_size_16-16        	    4826	    347929 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	    1660	    965310 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	    1723	   1063590 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	     378	   5191836 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	     336	   4367580 ns/op
BenchmarkExtension/RSGF8_size_32-16        	     625	   2098900 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     103	  11749111 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	      93	  12668505 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	     100	  13156762 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      15	  71221686 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	      25	  44580184 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	      25	  43923193 ns/op

klauspost + parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	   62449	     18151 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	   37773	     31655 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	   37440	     32032 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	   20168	     57223 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   25146	     46824 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	   20571	     57741 ns/op
BenchmarkExtension/RSGF8_size_16-16        	   10000	    116269 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	   10000	    104035 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	   10000	    102806 ns/op
BenchmarkExtension/RSGF8_size_32-16        	    2890	    394711 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	    4833	    255505 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	    4370	    253097 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     610	   2031755 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	    1654	    692710 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	    1682	    706555 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	     450	   2744351 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	     418	   2755398 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      87	  13653183 ns/op

klauspost + no parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	  135792	      8126 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	  172027	      6770 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	  166657	      6446 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	   50572	     23574 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	   45315	     23324 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   33289	     34886 ns/op
BenchmarkExtension/RSGF8_size_16-16        	    5365	    202124 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	   13940	     87224 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	   13310	     87464 ns/op
BenchmarkExtension/RSGF8_size_32-16        	     831	   1344954 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	    3138	    384758 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	    2631	    390911 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	     705	   1656914 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     124	   9772882 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	     687	   1618216 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      15	  72573958 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	     152	   7759006 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	     152	   7793278 ns/op

@musalbas musalbas marked this pull request as draft September 14, 2022 12:20
@codecov
Copy link

codecov bot commented Sep 14, 2022

Codecov Report

Merging #116 (136c5a9) into master (e2ae439) will increase coverage by 2.33%.
The diff coverage is 82.92%.

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   80.96%   83.29%   +2.33%     
==========================================
  Files           8        8              
  Lines         457      497      +40     
==========================================
+ Hits          370      414      +44     
+ Misses         52       50       -2     
+ Partials       35       33       -2     
Impacted Files Coverage Δ
extendeddatasquare.go 71.76% <68.75%> (+8.90%) ⬆️
extendeddatacrossword.go 78.16% <86.20%> (+2.55%) ⬆️
datasquare.go 93.33% <100.00%> (+0.63%) ⬆️
infectiousRSGF8.go 84.78% <100.00%> (+0.69%) ⬆️
tree.go 100.00% <100.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@musalbas musalbas changed the title Parallelise data encoding Parallelise data extension Sep 14, 2022
Copy link
Member

@Wondertan Wondertan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice

datasquare.go Outdated Show resolved Hide resolved
datasquare.go Outdated Show resolved Hide resolved
extendeddatasquare.go Show resolved Hide resolved
extendeddatasquare.go Outdated Show resolved Hide resolved
@musalbas
Copy link
Member Author

musalbas commented Sep 15, 2022

When parallelising the crossword loop, it works fine with infectious, but errors on go-leopard and klauspost leopard. Seems like a potential issue with doing parallel decodings. Otherwise, it resulted in 4x+ performance increase for 128x128 blocks with infectious.

Branch with parallel decoding (klauspost): https://github.com/celestiaorg/rsmt2d/blob/parallelisation_klauspost/extendeddatacrossword.go#L73

klauspost leopard:

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkRepair/RSGF8_4x4x256_ODS-16         	--- FAIL: BenchmarkRepair/RSGF8_4x4x256_ODS-16
    extendeddatacrossword_test.go:217: byzantine row: 3
BenchmarkRepair/LeopardFF8_4x4x256_ODS-16    	     518	   2273441 ns/op
BenchmarkRepair/LeopardFF16_4x4x256_ODS-16   	     499	   2318601 ns/op
BenchmarkRepair/RSGF8_8x8x256_ODS-16         	--- FAIL: BenchmarkRepair/RSGF8_8x8x256_ODS-16
    extendeddatacrossword_test.go:217: byzantine row: 7
BenchmarkRepair/LeopardFF8_8x8x256_ODS-16    	     300	   3929318 ns/op
BenchmarkRepair/LeopardFF16_8x8x256_ODS-16   	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5781d4]

goroutine 436131 [running]:
github.com/klauspost/reedsolomon.mulgf16_avx2({0xc0092c6d00, 0x100, 0x100}, {0x0, 0x100, 0x100}, 0xc0048ccc80)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/galois_gen_amd64.s:63559 +0x54
github.com/klauspost/reedsolomon.mulgf16({0xc0092c6d00?, 0x10?, 0x10?}, {0x0?, 0x100?, 0xc000280000?}, 0xffff?, 0x0?)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/galois_amd64.go:336 +0x6d
github.com/klauspost/reedsolomon.(*leopardFF16).reconstruct(0xc00011c000, {0xc006300300, 0x10?, 0x10?}, 0x1)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/leopard.go:456 +0x7fa
github.com/klauspost/reedsolomon.(*leopardFF16).Reconstruct(0x203001?, {0xc006300300?, 0x203001?, 0xc006300300?})
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/leopard.go:315 +0x25
github.com/celestiaorg/rsmt2d.decode({0xc006300300, 0x10, 0x10})
	/home/mus/Code/rsmt2d/leopard.go:62 +0xf8
github.com/celestiaorg/rsmt2d.leoRSFF16Codec.Decode(...)
	/home/mus/Code/rsmt2d/leopard.go:81
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).rebuildShares(0xc000078000, 0x1, {0xc006300300?, 0x10, 0x10})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:259 +0x58
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrosswordCol(0xc000078000, 0x2, {0xc006300600, 0x10, 0x10}, {0xc006300780, 0x10, 0x10})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:218 +0x1f3
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrossword.func2()
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:102 +0x4f
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:72 +0xa5
exit status 2
FAIL	github.com/celestiaorg/rsmt2d	6.506s

go-leopard:

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkRepair/Repairing_16x16_ODS_using_LeopardFF16-16         	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46ae98]

goroutine 6166 [running]:
github.com/celestiaorg/go-leopard._Cfunc_CBytes(...)
	_cgo_gotypes.go:46
github.com/celestiaorg/go-leopard.copyByteBuffer.func1({0x0, 0x100, 0xc000500001?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:215 +0x85
github.com/celestiaorg/go-leopard.copyByteBuffer({0x0?, 0xc00022dbc8?, 0x44f9d2?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:215 +0x25
github.com/celestiaorg/go-leopard.copyToCmallocedPtrs({0xc0003c0600, 0x10, 0x0?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:208 +0x85
github.com/celestiaorg/go-leopard.Recover({0xc0003c0600?, 0x10, 0x20}, {0xc0003c0780?, 0x10, 0x10})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:127 +0x36a
github.com/celestiaorg/go-leopard.Decode({0xc0003c0600?, 0x10, 0x4205e0?}, {0xc0003c0780?, 0x10, 0xc0006ef670?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:160 +0x3b
github.com/celestiaorg/rsmt2d.leoRSFF16Codec.Decode({}, {0xc0003c0600?, 0x300?, 0x0?})
	/home/mus/Code/rsmt2d/leopard.go:45 +0x54
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).rebuildShares(0xc0000e8060, 0x1, {0xc0003c0600?, 0x20, 0x20})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:259 +0x58
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrosswordRow(0xc0000e8060, 0x1a, {0xc000582000, 0x20, 0x20}, {0xc000582300, 0x20, 0x20})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:156 +0x1d4
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrossword.func1()
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:91 +0x4f
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:72 +0xa5
exit status 2
FAIL	github.com/celestiaorg/rsmt2d	0.101s

@musalbas
Copy link
Member Author

Marking this as ready for review as we can implement decoder parallelization later when the above issue is resolved.

@musalbas musalbas marked this pull request as ready for review September 15, 2022 14:58
@musalbas musalbas changed the title Parallelise data extension Parallelise data extension and pre-repair sanity check Sep 15, 2022
Copy link
Member

@evan-forbes evan-forbes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! I only had a single question, and will approve after anyone else that wants to gets a chance to review.

While this implementation doesn't allow for a specific configurable number of workers, I really like the increased simplicity we get by not.

on a side note, it's interesting to see that the new leopard codec seems to benefit less from parallelization than the infectious one, at least for extension.

datasquare.go Show resolved Hide resolved
@musalbas
Copy link
Member Author

While this implementation doesn't allow for a specific configurable number of workers, I really like the increased simplicity we get by not.

I think it would be easy to add this later by calling SetLimit() in errgroup, and adding ThreadCount to the EDS struct, or something similar.

@musalbas
Copy link
Member Author

Do you think not allowing working count to be configurable may cause performance issues for users of the library?

@evan-forbes
Copy link
Member

Do you think not allowing working count to be configurable may cause performance issues for users of the library?

not meaningfully, no

@evan-forbes
Copy link
Member

evan-forbes commented Sep 15, 2022

with a different implementation, a different version of go, and using a single thread there was some. I bet using more than one thread the difference is super super small or completely gone. tbh, even if there is, the simplicity of not having to configure it based on the max procs is worth it unless we're really trying to penny pinch.
overhead_erasure

edit: exposing this param to users is definitely not worth it, as that's one more thing to cause questions we have to answer lol

@musalbas
Copy link
Member Author

musalbas commented Sep 15, 2022

Hmm a 2x performance different between 16 and 128 goroutines seem significant, with 256x256 EDS we would have like 512 goroutines. But not super high priority as long as it doesn't cause problems for users.

Copy link
Contributor

@rahulghangas rahulghangas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I agree that limiting the number of goroutines via workers can be done in another PR. Just a small question below

Comment on lines +366 to 369
func (eds *ExtendedDataSquare) computeSharesRoot(shares [][]byte, axis Axis, i uint) []byte {
tree := eds.createTreeFn(axis, i)
for cell, d := range shares {
tree.Push(d, SquareIndex{Cell: uint(cell), Axis: i})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be reading this a bit wrong, but isn't the dual axis/Axis terminology a bit confusing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's more descriptive than calling the variable something like a imo

@musalbas musalbas merged commit 58bebde into master Sep 20, 2022
@musalbas musalbas deleted the parallelisation branch September 20, 2022 12:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pass the axis and axis index when create a tree Investigate performance of parallelization
4 participants