Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add RLE dump code #7691

Merged
merged 3 commits into from
Nov 26, 2021
Merged

Add RLE dump code #7691

merged 3 commits into from
Nov 26, 2021

Conversation

Kubuxu
Copy link
Contributor

@Kubuxu Kubuxu commented Nov 25, 2021

No description provided.

Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@Kubuxu Kubuxu requested a review from a team as a code owner November 25, 2021 20:40
@codecov
Copy link

codecov bot commented Nov 25, 2021

Codecov Report

Merging #7691 (4d8be81) into master (ab55a6f) will decrease coverage by 0.15%.
The diff coverage is 0.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #7691      +/-   ##
==========================================
- Coverage   39.56%   39.41%   -0.16%     
==========================================
  Files         637      637              
  Lines       67924    67979      +55     
==========================================
- Hits        26877    26796      -81     
- Misses      36445    36556     +111     
- Partials     4602     4627      +25     
Impacted Files Coverage Δ
cmd/lotus-shed/sectors.go 0.00% <0.00%> (ø)
node/modules/dtypes/mpool.go 87.50% <0.00%> (-12.50%) ⬇️
chain/events/message_cache.go 87.50% <0.00%> (-12.50%) ⬇️
blockstore/api.go 24.00% <0.00%> (-8.00%) ⬇️
blockstore/blockstore.go 62.96% <0.00%> (-7.41%) ⬇️
chain/events/observer.go 71.64% <0.00%> (-6.72%) ⬇️
miner/miner.go 52.31% <0.00%> (-4.64%) ⬇️
chain/stmgr/execute.go 86.95% <0.00%> (-4.35%) ⬇️
markets/storageadapter/ondealsectorcommitted.go 77.33% <0.00%> (-4.00%) ⬇️
chain/stmgr/call.go 71.51% <0.00%> (-3.64%) ⬇️
... and 17 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ab55a6f...4d8be81. Read the comment docs.

Copy link
Contributor

@arajasek arajasek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what motivated this

cmd/lotus-shed/sectors.go Outdated Show resolved Hide resolved
cmd/lotus-shed/sectors.go Outdated Show resolved Hide resolved
Co-authored-by: Aayush Rajasekaran <arajasek94@gmail.com>
@Kubuxu
Copy link
Contributor Author

Kubuxu commented Nov 26, 2021

I was analyzing how good/bad our RLE+ encoding is in practice.
Rough results:
A symbol in our case is a run.
Shannon Entropy of RLEs in AllocatedSectors: Η = 3.468 bits/symbol
RLE+ encoding: ΔH = 0.7687 bits/symbol
RLE+ with small modification: ΔH = 0.203 bits/symbol
RLE+ with Hufman: ΔH = ~0.0178 bits/symbol
RLE+ with Asymetric Numeral Systems: ΔH ~= 0.001 bits/symbol

We could save 18% of RLE+ storage but it is such a small fraction that it isn't worth the complexity right now (after collecting data I know that RLEs are only ~10MB of chain state, but the churn is frequent).
The major reason for the high ΔH of our RLE+ coding are 6-bit encodings of symbols 2 and 3.

If I were to design RLE+ today I would have gone with RLE with Huffman coding of small runs and additional data lengths.

Signed-off-by: Jakub Sztandera <kubuxu@protocol.ai>
@jennijuju jennijuju merged commit a4c2a20 into master Nov 26, 2021
@jennijuju jennijuju deleted the misc/rle-dump branch November 26, 2021 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants