Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: optimize tendermint zk light client #273

Conversation

i-m-aditya
Copy link
Contributor

@i-m-aditya i-m-aditya commented Feb 19, 2024

fix: optimize tendermint zk light client

progress:

  • modifying the Tendermint-rs library to an early exit
  • utility function to retrieve signatures from top validators
  • remove hardcoding and fetch relevant data from RPC

Refs: #233

@i-m-aditya
Copy link
Contributor Author

i-m-aditya commented Feb 19, 2024

@puma314 What I am thinking is to make an optimization PR to tendermint-rs, and wait for it to merge. Is my understanding correct?

@i-m-aditya i-m-aditya changed the title Optimize tendermint zk light client fix: optimize tendermint zk light client Feb 19, 2024
@puma314
Copy link
Contributor

puma314 commented Feb 19, 2024

@i-m-aditya it would be great if you could get the following TODO merged into tendermint-rs. This ensures that the verifier early terminates after it verifies 2/3 of the stake-weighted voting power. Then it would also be great to modify the existing tendermint program in SP1 to read in the block/signatures (instead of hard-coding it from a fixture) and modify the outside code in the script to pass in those values, making sure that the signed commits are passed in, in-order of validator voting power so the early termination terminates as soon as possible.

@i-m-aditya
Copy link
Contributor Author

@puma314 Sure I will get this done.

@i-m-aditya
Copy link
Contributor Author

@puma314 Have raised PR on tendermint-rs, waiting for merge.

@puma314
Copy link
Contributor

puma314 commented Feb 23, 2024

@puma314 Have raised PR on tendermint-rs, waiting for merge.

That's awesome! That LGTM, so hopefully they merge it. I think working on the next steps makes sense, like adding stuff to the script to get the input and then read it in using io instead of having it hardcoded (let me know if you might need pointers on how to fetch the relevant data from the RPC) and then also sorting the signatures by stake weight so that the early termination gets hit.

Let me know if you need pointers, this is awesome work!

@i-m-aditya
Copy link
Contributor Author

i-m-aditya commented Feb 26, 2024

@puma314 Because of the addition of reqwest crate in Cargo.toml, cargo prove build fails with the message

error: Socket2 doesn't support the compile target

I tried using hyper instead of request, but getting the same issue.
Do you have any idea why is it failing? Can you share your tg, I will share the entire failed message.

@puma314
Copy link
Contributor

puma314 commented Feb 26, 2024

@puma314 Because of the addition of reqwest crate in Cargo.toml, cargo prove build fails with the message

error: Socket2 doesn't support the compile target

I tried using hyper instead of request, but getting the same issue. Do you have any idea why is it failing? Can you share your tg, I will share the entire failed message.

Hey @i-m-aditya, you should join our TG support group and I can help in there! (Don't paste the entire error message in the chat since it pollutes it, just put it in a Github Gist and then put a link to that). The link to our TG support chat is in our docs at the top on the homepage: https://succinctlabs.github.io/sp1/ (I don't want to link directly in here to prevent spam).

I took a look at your PR and it's actually really close to working! But you actually want to have all the stuff in programs/src/utils.rs moved to script/main.rs. Basically your mental model should be that all the data fetching (network requests, sorting, etc.) is done in the script which gathers all the inputs needed for proof generation.

THEN you can feed in the LightBlock to the proof with the stdin of the prover (docs here https://succinctlabs.github.io/sp1/writing-programs/inputs-and-outputs.html and also here for how the script utilizes the stdin: https://succinctlabs.github.io/sp1/generating-proofs/basics.html).

Then you can read the LightBlock inside your program with sp1_zkvm::io::read::<LightBlock>(); and do stuff with it.

So you're really close! Just move all the fetching logic into script and out of program and you should be all good to go! Looking forward to it.

@i-m-aditya
Copy link
Contributor Author

@puma314 after moving fetching logic to the script, the build works. Thanks

@i-m-aditya i-m-aditya force-pushed the i-m-aditya/optimize-tendermint-zk-light-client branch from 88aa1cc to ea04c63 Compare February 27, 2024 19:36
@i-m-aditya i-m-aditya marked this pull request as ready for review February 28, 2024 05:17
@ValarDragon
Copy link

ValarDragon commented Feb 28, 2024

Seeing that your editing tendermint-rs, this issue I made on the go side may apply for tendermint-rs as well! cometbft/cometbft#2365

I've never read the Tendermint-rs code, so not quite sure. Will try looking into the codebase for it

@i-m-aditya i-m-aditya force-pushed the i-m-aditya/optimize-tendermint-zk-light-client branch from bbae024 to aae77fa Compare March 1, 2024 15:42
@puma314
Copy link
Contributor

puma314 commented Mar 2, 2024

@i-m-aditya is this ready for a look? I can take a look over the weekend!

Copy link
Contributor

@puma314 puma314 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks awesome! I'm curious on Tendermint how much this change actually saves. If you run the script with RUST_LOG=info, you should see an output like this:

2024-03-02T07:35:01.965228Z  INFO runtime.run(...):load memory: close time.busy=1.68ms time.idle=416ns
2024-03-02T07:35:02.390567Z  INFO runtime.run(...):postprocess: close time.busy=8.69ms time.idle=708ns
2024-03-02T07:35:02.390582Z  INFO runtime.run(...): close time.busy=427ms time.idle=85.4µs
2024-03-02T07:35:02.400696Z  INFO runtime.prove(...): Sharding the execution record.
2024-03-02T07:35:02.400712Z  INFO runtime.prove(...): Generating trace for each chip.
2024-03-02T07:35:02.400713Z  INFO runtime.prove(...): Record stats before generate_trace (incomplete): ShardStats {
    nb_cpu_events: 7476561,
    nb_add_events: 2126546,
    nb_mul_events: 11116,
    nb_sub_events: 54075,
    nb_bitwise_events: 646940,
    nb_shift_left_events: 142595,
    nb_shift_right_events: 274016,
    nb_divrem_events: 0,
    nb_lt_events: 81862,
    nb_field_events: 0,
    nb_sha_extend_events: 0,
    nb_sha_compress_events: 0,
    nb_keccak_permute_events: 2916,
    nb_ed_add_events: 0,
    nb_ed_decompress_events: 0,
    nb_weierstrass_add_events: 0,
    nb_weierstrass_double_events: 0,
    nb_k256_decompress_events: 0,
}

I'm curious what the number of nb_ed_add_events and nb_ed_decompress_events are. Are you running this with your branch in Tendermint-rs with the fixes for early existing the stake weight?

// Generate proof.
utils::setup_logger();
let stdin = SP1Stdin::new();
let peer_id: [u8; 20] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this peer_id? can you add a comment/doc for it and what it stands for?

@@ -1,11 +1,30 @@
use sp1_core::{utils, SP1Prover, SP1Stdin, SP1Verifier};

use crate::util::generate_light_block_at_given_block_height;

const ED25519_ELF: &[u8] = include_bytes!("../../program/elf/riscv32im-succinct-zkvm-elf");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm I know this isn't your fault but we should change the name of this :P


stdin.write(&light_block_1);
stdin.write(&light_block_2);

let proof = SP1Prover::prove(ED25519_ELF, stdin).expect("proving failed");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: let's change the name of this to TENDERMINT_ELF

@i-m-aditya i-m-aditya force-pushed the i-m-aditya/optimize-tendermint-zk-light-client branch from aae77fa to 68ee3ce Compare March 2, 2024 15:55
@puma314
Copy link
Contributor

puma314 commented Mar 2, 2024

@i-m-aditya when I ran this locally, I got this error in the runtime, which means that the program cannot deserialize the input that's being passed in.

2024-03-02T18:51:53.584277Z  INFO runtime.run(...):load memory: close time.busy=1.65ms time.idle=626ns
2024-03-02T18:51:53.585035Z  INFO runtime.run(...): stderr: thread '
2024-03-02T18:51:53.585048Z  INFO runtime.run(...): stderr: <unnamed>
2024-03-02T18:51:53.585052Z  INFO runtime.run(...): stderr: ' panicked at
2024-03-02T18:51:53.585071Z  INFO runtime.run(...): stderr: /Users/umaroy/Documents/sp1/zkvm/precompiles/src/io.rs
2024-03-02T18:51:53.585075Z  INFO runtime.run(...): stderr: :
2024-03-02T18:51:53.585086Z  INFO runtime.run(...): stderr: 47
2024-03-02T18:51:53.585090Z  INFO runtime.run(...): stderr: :
2024-03-02T18:51:53.585100Z  INFO runtime.run(...): stderr: 12
2024-03-02T18:51:53.585103Z  INFO runtime.run(...): stderr: :
2024-03-02T18:51:53.585110Z  INFO runtime.run(...): stderr: called `Result::unwrap()` on an `Err` value: InvalidTagEncoding(64)
2024-03-02T18:51:53.585113Z  INFO runtime.run(...): stderr:
2024-03-02T18:51:53.585133Z  INFO runtime.run(...): stderr: stack backtrace:
2024-03-02T18:51:53.585141Z  INFO runtime.run(...): stderr: note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
2024-03-02T18:51:53.585159Z  INFO runtime.run(...): stderr: called `Result::unwrap()` on an `Err` value: InvalidTagEncoding(64)

@i-m-aditya
Copy link
Contributor Author

@puma314 can you check now, there were earlier some dependency resolution issues.
Screenshot 2024-03-03 at 12 52 13 AM

@puma314
Copy link
Contributor

puma314 commented Mar 3, 2024

Hey @i-m-aditya, I think if you scroll up in your terminal, you'll see at the beginning of your program that your program is still panicking. This is why the number of cycles (8571) is so small in your image you shared--because the program is panicking and then early exiting. I actually have a PR out right now that will panic in the runtime if the program itself panics (it's still technical a valid execution of RISCV bytecode to panic, which is why you are still able to generate a proof).

But I would suggest just scrolling up in your terminal to debug your program, or use the execution only mode of SP1 to just debug the execution of your program: https://succinctlabs.github.io/sp1/generating-proofs/advanced.html#execution-only.

Let me know if you have any questions! Also feel free to join the SP1 telegram support group if you want to have more realtime chat support and I can help you out there!

@i-m-aditya i-m-aditya force-pushed the i-m-aditya/optimize-tendermint-zk-light-client branch from ec8e55e to d389efb Compare March 5, 2024 09:32
@i-m-aditya i-m-aditya force-pushed the i-m-aditya/optimize-tendermint-zk-light-client branch from d389efb to 2beee8e Compare March 5, 2024 09:34
@puma314 puma314 merged commit c3f906a into succinctlabs:main Mar 8, 2024
3 checks passed
jtguibas pushed a commit that referenced this pull request Mar 29, 2024
Co-authored-by: Uma Roy <uma.roy.us@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants