Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

continuation optimization #217

Closed
wants to merge 34 commits into from

Conversation

dajuguan
Copy link
Contributor

@dajuguan dajuguan commented Dec 8, 2023

Presently, when zkwasm's instructions exceed 2 billion (as observed in zkGo), the generated trace table becomes too large to fit into memory. Moreover, the generation of the witness table consumes a considerable amount of time, taking, for instance, up to 7 hours for 2 billion instructions. This pull request aims to optimize several aspects:

  1. Implementing the capability to dump the trace table and reload it to reconstruct the circuit accurately. A specific test case test_rlp_from_file will be provided to ensure the outcome aligns with test_rlp_slice, as in 1rd commit.

  2. Introducing a tracer callback mechanism to dump tables periodically per compute_slice_capability function's output, as in the 2rd commit. Note that there is also a related pr in wasmi repo. Notably,wasm's maximum memory has been hard-coded to 64MB via LINEAR_MEMORY_MAX_PAGES , otherwise, the current implementation will incur OOM due to push_init_memory pushes all the memory into imtable.

  3. Adding support for binary files as private inputs for scenarios involving large input sizes. The new arg is --private <filename>:file, as in the 3rd commit.

  4. Further optimizing witness generation time through strategies such as caching, flexbuffer utilization, and pooling, as in this commit

  5. [WIF] reconstruct code

Copy link
Contributor

@xgaozoyoe xgaozoyoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small adjustments are needed.

crates/specs/src/lib.rs Outdated Show resolved Hide resolved
crates/specs/src/lib.rs Outdated Show resolved Hide resolved
crates/specs/src/lib.rs Outdated Show resolved Hide resolved
crates/specs/src/state.rs Outdated Show resolved Hide resolved
crates/zkwasm/src/test/test_rlp_slice.rs Outdated Show resolved Hide resolved
@xgaozoyoe
Copy link
Contributor

Implementing the capability to dump the trace table and reload it to reconstruct the circuit accurately. A specific test case test_rlp_from_file will be provided to ensure the outcome aligns with test_rlp_slice.

I can not figure out how the output of test_rlp_slice and test_rlp_from_file are compared.

Introducing a tracer callback mechanism to dump tables periodically, say, after every 1,000,000 instructions.
Would you please say a bit more about how to use the tracer callback? Is this included in this pr?

Adding support for binary files as private inputs for scenarios involving large input sizes
Is this included in this pr?

Further optimizing witness generation time through strategies such as caching, flexbuffer utilization, and pooling

Is this included in this pr?

If this is a WIP PR, please mark it as WIP.

@dajuguan
Copy link
Contributor Author

Presently, when zkwasm's instructions exceed 2 billion (as observed in zkGo), the generated trace table becomes too large to fit into memory. Moreover, the generation of the witness table consumes a considerable amount of time, taking, for instance, up to 7 hours for 2 billion instructions. This pull request aims to optimize several aspects:

  1. Implementing the capability to dump the trace table and reload it to reconstruct the circuit accurately. A specific test case test_rlp_from_file will be provided to ensure the outcome aligns with test_rlp_slice, as in the current commit.
  2. Introducing a tracer callback mechanism to dump tables periodically, say, after every 1,000,000 instructions. This approach resembles the methodology outlined in this commit.
  3. Adding support for binary files as private inputs for scenarios involving large input sizes, mirroring the implementation in this commit.
  4. Further optimizing witness generation time through strategies such as caching, flexbuffer utilization, and pooling, similar to the enhancements introduced in this commit

Implementing the capability to dump the trace table and reload it to reconstruct the circuit accurately. A specific test case test_rlp_from_file will be provided to ensure the outcome aligns with test_rlp_slice.

I can not figure out how the output of test_rlp_slice and test_rlp_from_file are compared.

Introducing a tracer callback mechanism to dump tables periodically, say, after every 1,000,000 instructions.
Would you please say a bit more about how to use the tracer callback? Is this included in this pr?

Adding support for binary files as private inputs for scenarios involving large input sizes
Is this included in this pr?

Further optimizing witness generation time through strategies such as caching, flexbuffer utilization, and pooling

Is this included in this pr?

If this is a WIP PR, please mark it as WIP.

Marked the as WIPs.

@dajuguan
Copy link
Contributor Author

Implementing the capability to dump the trace table and reload it to reconstruct the circuit accurately. A specific test case test_rlp_from_file will be provided to ensure the outcome aligns with test_rlp_slice.

I can not figure out how the output of test_rlp_slice and test_rlp_from_file are compared.

Introducing a tracer callback mechanism to dump tables periodically, say, after every 1,000,000 instructions.
Would you please say a bit more about how to use the tracer callback? Is this included in this pr?

Adding support for binary files as private inputs for scenarios involving large input sizes
Is this included in this pr?

Further optimizing witness generation time through strategies such as caching, flexbuffer utilization, and pooling

Is this included in this pr?

If this is a WIP PR, please mark it as WIP.

Regarding comparing test_rlp_from_file with test_rlp_slice, test_rlp_from_file operates by extracting tables from Slice, reconstructing the circuit using the dumped file, and subsequently validating each slice's integrity via its respective mock_test, mirroring the process of test_rlp_slice.

Others are marked as WIP.

context_output.clone(),
)?;

write_context_output(&context_out.lock().unwrap(), context_out_path)?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace context_out with context_output and remove context_out variable?

@@ -41,3 +76,931 @@ impl InitMemoryTable {
self.0.get(&(ltype, offset))
}
}

pub fn memory_event_of_step(event: &EventTableEntry) -> Vec<MemoryTableEntry> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it’s necessary, could you move the function to specs/src/mtable.rs? I’m curious why it was moved to the specs crate.

@junyu0312 junyu0312 deleted the branch DelphinusLab:cont_dev April 26, 2024 04:04
@junyu0312 junyu0312 closed this Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants