Parallelise data extension and pre-repair sanity check #116

musalbas · 2022-09-14T12:20:06Z

Notes:

although we implement some mutexes in dataSquare, we do not claim that dataSquare itself is now thread-safe, as we only added the minimal number of locks to make parallel extension work.
in all the following benchmarks, extension has been benchmarks without computing the row and column roots, which continues to be the primary bottleneck.
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz

go-leopard + parallelisation

BenchmarkExtension/LeopardFF8_size_4-16         	   21063	     58295 ns/op
BenchmarkExtension/LeopardFF16_size_4-16        	   20356	     60095 ns/op
BenchmarkExtension/RSGF8_size_4-16              	   46251	     25855 ns/op
BenchmarkExtension/RSGF8_size_8-16              	   16380	     73586 ns/op
BenchmarkExtension/LeopardFF8_size_8-16         	    8278	    145906 ns/op
BenchmarkExtension/LeopardFF16_size_8-16        	    8599	    145683 ns/op
BenchmarkExtension/RSGF8_size_16-16             	    5766	    211363 ns/op
BenchmarkExtension/LeopardFF8_size_16-16        	    2546	    413941 ns/op
BenchmarkExtension/LeopardFF16_size_16-16       	    2623	    396130 ns/op
BenchmarkExtension/RSGF8_size_32-16             	    1718	    742880 ns/op
BenchmarkExtension/LeopardFF8_size_32-16        	     949	   1280569 ns/op
BenchmarkExtension/LeopardFF16_size_32-16       	     910	   1330538 ns/op
BenchmarkExtension/RSGF8_size_64-16             	     416	   2743251 ns/op
BenchmarkExtension/LeopardFF8_size_64-16        	     384	   3193578 ns/op
BenchmarkExtension/LeopardFF16_size_64-16       	     381	   3094233 ns/op
BenchmarkExtension/RSGF8_size_128-16            	      88	  13574625 ns/op
BenchmarkExtension/LeopardFF8_size_128-16       	     100	  10233559 ns/op
BenchmarkExtension/LeopardFF16_size_128-16      	     100	  10620378 ns/op

go-leopard + no parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	  117916	     11580 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	   15774	     92830 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	   15693	     90473 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   28026	     58276 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	    6709	    282830 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	    6716	    203269 ns/op
BenchmarkExtension/RSGF8_size_16-16        	    4826	    347929 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	    1660	    965310 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	    1723	   1063590 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	     378	   5191836 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	     336	   4367580 ns/op
BenchmarkExtension/RSGF8_size_32-16        	     625	   2098900 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     103	  11749111 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	      93	  12668505 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	     100	  13156762 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      15	  71221686 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	      25	  44580184 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	      25	  43923193 ns/op

klauspost + parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	   62449	     18151 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	   37773	     31655 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	   37440	     32032 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	   20168	     57223 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   25146	     46824 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	   20571	     57741 ns/op
BenchmarkExtension/RSGF8_size_16-16        	   10000	    116269 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	   10000	    104035 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	   10000	    102806 ns/op
BenchmarkExtension/RSGF8_size_32-16        	    2890	    394711 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	    4833	    255505 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	    4370	    253097 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     610	   2031755 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	    1654	    692710 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	    1682	    706555 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	     450	   2744351 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	     418	   2755398 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      87	  13653183 ns/op

klauspost + no parallelisation

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkExtension/RSGF8_size_4-16         	  135792	      8126 ns/op
BenchmarkExtension/LeopardFF8_size_4-16    	  172027	      6770 ns/op
BenchmarkExtension/LeopardFF16_size_4-16   	  166657	      6446 ns/op
BenchmarkExtension/LeopardFF8_size_8-16    	   50572	     23574 ns/op
BenchmarkExtension/LeopardFF16_size_8-16   	   45315	     23324 ns/op
BenchmarkExtension/RSGF8_size_8-16         	   33289	     34886 ns/op
BenchmarkExtension/RSGF8_size_16-16        	    5365	    202124 ns/op
BenchmarkExtension/LeopardFF8_size_16-16   	   13940	     87224 ns/op
BenchmarkExtension/LeopardFF16_size_16-16  	   13310	     87464 ns/op
BenchmarkExtension/RSGF8_size_32-16        	     831	   1344954 ns/op
BenchmarkExtension/LeopardFF8_size_32-16   	    3138	    384758 ns/op
BenchmarkExtension/LeopardFF16_size_32-16  	    2631	    390911 ns/op
BenchmarkExtension/LeopardFF16_size_64-16  	     705	   1656914 ns/op
BenchmarkExtension/RSGF8_size_64-16        	     124	   9772882 ns/op
BenchmarkExtension/LeopardFF8_size_64-16   	     687	   1618216 ns/op
BenchmarkExtension/RSGF8_size_128-16       	      15	  72573958 ns/op
BenchmarkExtension/LeopardFF8_size_128-16  	     152	   7759006 ns/op
BenchmarkExtension/LeopardFF16_size_128-16 	     152	   7793278 ns/op

codecov · 2022-09-14T12:24:39Z

Codecov Report

Merging #116 (136c5a9) into master (e2ae439) will increase coverage by 2.33%.
The diff coverage is 82.92%.

@@            Coverage Diff             @@
##           master     #116      +/-   ##
==========================================
+ Coverage   80.96%   83.29%   +2.33%     
==========================================
  Files           8        8              
  Lines         457      497      +40     
==========================================
+ Hits          370      414      +44     
+ Misses         52       50       -2     
+ Partials       35       33       -2

Impacted Files	Coverage Δ
extendeddatasquare.go	`71.76% <68.75%> (+8.90%)`	⬆️
extendeddatacrossword.go	`78.16% <86.20%> (+2.55%)`	⬆️
datasquare.go	`93.33% <100.00%> (+0.63%)`	⬆️
infectiousRSGF8.go	`84.78% <100.00%> (+0.69%)`	⬆️
tree.go	`100.00% <100.00%> (ø)`

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Wondertan

Nice

datasquare.go

extendeddatasquare.go

…l nmt caches (#119)

musalbas · 2022-09-15T13:40:32Z

When parallelising the crossword loop, it works fine with infectious, but errors on go-leopard and klauspost leopard. Seems like a potential issue with doing parallel decodings. Otherwise, it resulted in 4x+ performance increase for 128x128 blocks with infectious.

Branch with parallel decoding (klauspost): https://github.com/celestiaorg/rsmt2d/blob/parallelisation_klauspost/extendeddatacrossword.go#L73

klauspost leopard:

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkRepair/RSGF8_4x4x256_ODS-16         	--- FAIL: BenchmarkRepair/RSGF8_4x4x256_ODS-16
    extendeddatacrossword_test.go:217: byzantine row: 3
BenchmarkRepair/LeopardFF8_4x4x256_ODS-16    	     518	   2273441 ns/op
BenchmarkRepair/LeopardFF16_4x4x256_ODS-16   	     499	   2318601 ns/op
BenchmarkRepair/RSGF8_8x8x256_ODS-16         	--- FAIL: BenchmarkRepair/RSGF8_8x8x256_ODS-16
    extendeddatacrossword_test.go:217: byzantine row: 7
BenchmarkRepair/LeopardFF8_8x8x256_ODS-16    	     300	   3929318 ns/op
BenchmarkRepair/LeopardFF16_8x8x256_ODS-16   	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x5781d4]

goroutine 436131 [running]:
github.com/klauspost/reedsolomon.mulgf16_avx2({0xc0092c6d00, 0x100, 0x100}, {0x0, 0x100, 0x100}, 0xc0048ccc80)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/galois_gen_amd64.s:63559 +0x54
github.com/klauspost/reedsolomon.mulgf16({0xc0092c6d00?, 0x10?, 0x10?}, {0x0?, 0x100?, 0xc000280000?}, 0xffff?, 0x0?)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/galois_amd64.go:336 +0x6d
github.com/klauspost/reedsolomon.(*leopardFF16).reconstruct(0xc00011c000, {0xc006300300, 0x10?, 0x10?}, 0x1)
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/leopard.go:456 +0x7fa
github.com/klauspost/reedsolomon.(*leopardFF16).Reconstruct(0x203001?, {0xc006300300?, 0x203001?, 0xc006300300?})
	/home/mus/go/pkg/mod/github.com/klauspost/reedsolomon@v1.11.0/leopard.go:315 +0x25
github.com/celestiaorg/rsmt2d.decode({0xc006300300, 0x10, 0x10})
	/home/mus/Code/rsmt2d/leopard.go:62 +0xf8
github.com/celestiaorg/rsmt2d.leoRSFF16Codec.Decode(...)
	/home/mus/Code/rsmt2d/leopard.go:81
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).rebuildShares(0xc000078000, 0x1, {0xc006300300?, 0x10, 0x10})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:259 +0x58
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrosswordCol(0xc000078000, 0x2, {0xc006300600, 0x10, 0x10}, {0xc006300780, 0x10, 0x10})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:218 +0x1f3
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrossword.func2()
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:102 +0x4f
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:72 +0xa5
exit status 2
FAIL	github.com/celestiaorg/rsmt2d	6.506s

go-leopard:

goos: linux
goarch: amd64
pkg: github.com/celestiaorg/rsmt2d
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkRepair/Repairing_16x16_ODS_using_LeopardFF16-16         	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x46ae98]

goroutine 6166 [running]:
github.com/celestiaorg/go-leopard._Cfunc_CBytes(...)
	_cgo_gotypes.go:46
github.com/celestiaorg/go-leopard.copyByteBuffer.func1({0x0, 0x100, 0xc000500001?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:215 +0x85
github.com/celestiaorg/go-leopard.copyByteBuffer({0x0?, 0xc00022dbc8?, 0x44f9d2?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:215 +0x25
github.com/celestiaorg/go-leopard.copyToCmallocedPtrs({0xc0003c0600, 0x10, 0x0?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:208 +0x85
github.com/celestiaorg/go-leopard.Recover({0xc0003c0600?, 0x10, 0x20}, {0xc0003c0780?, 0x10, 0x10})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:127 +0x36a
github.com/celestiaorg/go-leopard.Decode({0xc0003c0600?, 0x10, 0x4205e0?}, {0xc0003c0780?, 0x10, 0xc0006ef670?})
	/home/mus/go/pkg/mod/github.com/celestiaorg/go-leopard@v0.1.0/wrapper.go:160 +0x3b
github.com/celestiaorg/rsmt2d.leoRSFF16Codec.Decode({}, {0xc0003c0600?, 0x300?, 0x0?})
	/home/mus/Code/rsmt2d/leopard.go:45 +0x54
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).rebuildShares(0xc0000e8060, 0x1, {0xc0003c0600?, 0x20, 0x20})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:259 +0x58
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrosswordRow(0xc0000e8060, 0x1a, {0xc000582000, 0x20, 0x20}, {0xc000582300, 0x20, 0x20})
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:156 +0x1d4
github.com/celestiaorg/rsmt2d.(*ExtendedDataSquare).solveCrossword.func1()
	/home/mus/Code/rsmt2d/extendeddatacrossword.go:91 +0x4f
golang.org/x/sync/errgroup.(*Group).Go.func1()
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:75 +0x64
created by golang.org/x/sync/errgroup.(*Group).Go
	/home/mus/go/pkg/mod/golang.org/x/sync@v0.0.0-20220907140024-f12130a52804/errgroup/errgroup.go:72 +0xa5
exit status 2
FAIL	github.com/celestiaorg/rsmt2d	0.101s

This reverts commit fa3361f.

musalbas · 2022-09-15T14:58:23Z

Marking this as ready for review as we can implement decoder parallelization later when the above issue is resolved.

evan-forbes

Nice! I only had a single question, and will approve after anyone else that wants to gets a chance to review.

While this implementation doesn't allow for a specific configurable number of workers, I really like the increased simplicity we get by not.

on a side note, it's interesting to see that the new leopard codec seems to benefit less from parallelization than the infectious one, at least for extension.

datasquare.go

musalbas · 2022-09-15T21:01:21Z

While this implementation doesn't allow for a specific configurable number of workers, I really like the increased simplicity we get by not.

I think it would be easy to add this later by calling SetLimit() in errgroup, and adding ThreadCount to the EDS struct, or something similar.

musalbas · 2022-09-15T21:25:15Z

Do you think not allowing working count to be configurable may cause performance issues for users of the library?

evan-forbes · 2022-09-15T21:26:35Z

Do you think not allowing working count to be configurable may cause performance issues for users of the library?

not meaningfully, no

evan-forbes · 2022-09-15T21:35:36Z

with a different implementation, a different version of go, and using a single thread there was some. I bet using more than one thread the difference is super super small or completely gone. tbh, even if there is, the simplicity of not having to configure it based on the max procs is worth it unless we're really trying to penny pinch.

edit: exposing this param to users is definitely not worth it, as that's one more thing to cause questions we have to answer lol

musalbas · 2022-09-15T21:39:45Z

Hmm a 2x performance different between 16 and 128 goroutines seem significant, with 256x256 EDS we would have like 512 goroutines. But not super high priority as long as it doesn't cause problems for users.

rahulghangas

LGTM, I agree that limiting the number of goroutines via workers can be done in another PR. Just a small question below

rahulghangas · 2022-09-16T09:02:27Z

extendeddatacrossword.go

+func (eds *ExtendedDataSquare) computeSharesRoot(shares [][]byte, axis Axis, i uint) []byte {
+	tree := eds.createTreeFn(axis, i)
 	for cell, d := range shares {
 		tree.Push(d, SquareIndex{Cell: uint(cell), Axis: i})


I might be reading this a bit wrong, but isn't the dual axis/Axis terminology a bit confusing?

It's more descriptive than calling the variable something like a imo

musalbas added 6 commits September 13, 2022 23:06

Refactor main extension loop into seperate functions.

3f8f7e0

Parallel extension.

a6dfb9f

Actually make it parallel.

bd66080

Merge branch 'master' into parallelisation

3fa579d

Add relevant mutexes.

866085f

Re-enable row and col computation in benchmarks.

1b2e0a9

musalbas marked this pull request as draft September 14, 2022 12:20

musalbas changed the title ~~Parallelise data encoding~~ Parallelise data extension Sep 14, 2022

musalbas added 2 commits September 14, 2022 22:58

Simplify variable name.

007f6b1

Parallelise data root construction, switch to sha256-simd.

c2c8078

musalbas force-pushed the parallelisation branch from 1796f12 to c2c8078 Compare September 14, 2022 23:05

Wondertan reviewed Sep 15, 2022

View reviewed changes

datasquare.go Outdated Show resolved Hide resolved

datasquare.go Outdated Show resolved Hide resolved

extendeddatasquare.go Show resolved Hide resolved

extendeddatasquare.go Outdated Show resolved Hide resolved

musalbas added 6 commits September 15, 2022 13:22

Remove mutexes for root generation

82e9498

Simplify further

5bf8791

Return errs.Wait() directly

246091c

Fix getRow/Colroot comments.

7bf4658

Assign data roots when eds is solved

fa3361f

Pass the axis and axis index when create a tree, to allow for paralle…

4b3ad26

…l nmt caches (#119)

musalbas mentioned this pull request Sep 15, 2022

Pass the axis and axis index when create a tree #119

Closed

Parallelise prerepair sanity check

59bfd18

Revert "Assign data roots when eds is solved"

136c5a9

This reverts commit fa3361f.

musalbas marked this pull request as ready for review September 15, 2022 14:58

musalbas requested review from rahulghangas and evan-forbes September 15, 2022 14:58

musalbas changed the title ~~Parallelise data extension~~ Parallelise data extension and pre-repair sanity check Sep 15, 2022

This was referenced Sep 15, 2022

Use the new rsmt2d.TreeConstructorFn celestiaorg/celestia-app#712

Closed

Investigate and potentially fix the multi-threading issue with decoding using the klauspost leopard implementation #123

Open

evan-forbes reviewed Sep 15, 2022

View reviewed changes

datasquare.go Show resolved Hide resolved

This was linked to issues Sep 15, 2022

Investigate performance of parallelization #5

Closed

Pass the axis and axis index when create a tree #119

Closed

rahulghangas approved these changes Sep 16, 2022

View reviewed changes

evan-forbes approved these changes Sep 19, 2022

View reviewed changes

musalbas merged commit 58bebde into master Sep 20, 2022

musalbas deleted the parallelisation branch September 20, 2022 12:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelise data extension and pre-repair sanity check #116

Parallelise data extension and pre-repair sanity check #116

musalbas commented Sep 14, 2022 •

edited

Loading

codecov bot commented Sep 14, 2022 •

edited

Loading

Wondertan left a comment

musalbas commented Sep 15, 2022 •

edited

Loading

musalbas commented Sep 15, 2022

evan-forbes left a comment •

edited

Loading

musalbas commented Sep 15, 2022

musalbas commented Sep 15, 2022

evan-forbes commented Sep 15, 2022

evan-forbes commented Sep 15, 2022 •

edited

Loading

musalbas commented Sep 15, 2022 •

edited

Loading

rahulghangas left a comment

rahulghangas Sep 16, 2022

musalbas Sep 18, 2022

Parallelise data extension and pre-repair sanity check #116

Parallelise data extension and pre-repair sanity check #116

Conversation

musalbas commented Sep 14, 2022 • edited Loading

codecov bot commented Sep 14, 2022 • edited Loading

Codecov Report

Wondertan left a comment

Choose a reason for hiding this comment

musalbas commented Sep 15, 2022 • edited Loading

musalbas commented Sep 15, 2022

evan-forbes left a comment • edited Loading

Choose a reason for hiding this comment

musalbas commented Sep 15, 2022

musalbas commented Sep 15, 2022

evan-forbes commented Sep 15, 2022

evan-forbes commented Sep 15, 2022 • edited Loading

musalbas commented Sep 15, 2022 • edited Loading

rahulghangas left a comment

Choose a reason for hiding this comment

rahulghangas Sep 16, 2022

Choose a reason for hiding this comment

musalbas Sep 18, 2022

Choose a reason for hiding this comment

musalbas commented Sep 14, 2022 •

edited

Loading

codecov bot commented Sep 14, 2022 •

edited

Loading

musalbas commented Sep 15, 2022 •

edited

Loading

evan-forbes left a comment •

edited

Loading

evan-forbes commented Sep 15, 2022 •

edited

Loading

musalbas commented Sep 15, 2022 •

edited

Loading