-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
erasure coding test vectors #4
base: master
Are you sure you want to change the base?
Conversation
Hi @cheme! We are seeing differences between JAM's graypaper Appendix H and your test vectors / Parity's implementation and have some questions below
As the table shows, the "shard" was originally defined as "two octets" in Appendix H, which is equivalent to two bytes. However, the Parity implementation uses a different size, possibly due to the algorithm and package limitations. The "K (original data)" and "N (codewords)" under both Parity's Implementation columns do not follow the definitions provided in the original paper. Instead, they likely refer to the number of original data shards and encoded data shards, respectively. Can you explain this discrepancy?
It seems the catid/leopard repository uses a "shard" trick to handle this limitation. Therefore, for ordian/reed-solomon-simd (Rust) and AndersTrklauspost/reedsolomon, are limited by the requirement that "shard size must be a multiple of 64 bytes. (see documentation)" instead of what the GP k:n parameters are, which are not multiples of 64. Can you explain this discrepancy? What are we missing?
Solved -- see below. |
For JAM Erasure Coding, the first actual test case everyone should use is: The Parity Rust library is sufficient to decode the first test case above:
The test uses the last 342 2-byte chunks to reconstruct the first 684 2-byte chunks (684+342=1026), and checks that the first 342 chunks match the original data. |
Thanks (I see this conversation a bit late), the test vectors here are using a previous formulation, but I think it is the same (will check that next week, also will switch from base64 to hex). |
I understand how the 6 points x 2 bytes/point=12 bytes needs clarification. Probably 14.4's "Since the data for segment chunks is so small at 12 bytes, ..." needs some explanation. Are there other references to 12 bytes you can identify? "W_S x W_C = 4104" is specified (684x6=342x32=4104=1026x4) in section 14 and I believe its non-negotiable. If you wish to debate for 4096 because of page proofs, please make the case? |
from matrix conversation).
@cheme do you mind adding |
👍 note that for all binary tree there miss the $node describe in gp (not sure what it is and what it's meant for), probably will have to update at some point. |
Hi @cheme , I have some questions regarding the paged-proof. In the JAM's graypaper, it’s written that the I’m not sure if your method is intended to calculate the concatenation of the left and right node hashes, but in blake2b_simd, the update() function doesn’t seem to concatenate different inputs but instead processes data according to the data stream. Below is a comparison between using update() and concatenation. Combine:
Concatenate:
|
yes as mentioned in readme this part is missing, I think in this case it is not a full prefix(node) but $leaf or $node. I was not too sure how what it was, ends up likely just b"leaf" or b"node" hardcoded (I feel like it is not strongly needed, likely here for hygiene). Code producing this is likely to be removed/replaced , so probably in this PR only one that are ok are "ec_"* (not too sure the full work package ec is still correct with latest gp changes).
yes I guess correct approach is concatenation with the additional b"leaf" and b"node" (I prefer multiple update but if appending b"leaf" b"node" concat will likely be similar). |
This is test vector build from paritytech/erasure-coding#15.
See
TODO
section ofREADME
for a few unknown.