Skip to content
This repository has been archived by the owner on Aug 16, 2024. It is now read-only.

perf: reduce memory use #57

Merged
merged 16 commits into from
Jan 31, 2024
Merged

perf: reduce memory use #57

merged 16 commits into from
Jan 31, 2024

Conversation

joonazan
Copy link
Contributor

@joonazan joonazan commented Dec 15, 2023

What ❔

Enables improving peak memory use by computing circuits incrementally.

The API between this crate and the witness generator changes significantly. Instead of returning circuits, this crate now takes callbacks which it calls whenever a new circuit is ready. It could directly take a channel instead but I feel that is maybe not generic enough.

This PR incrementalizes just one type of circuit to make sure that it is possible.

Also removes unnecessary datastructures, clones etc. which improves memory use as well.

Why ❔

We have to improve memory use to be able to process larger batches on reasonable computers.

The new API allows emitting circuits as soon as possible instead of keeping around circuit inputs till the end and then building all of them. I measured that the vast majority of memory goes into storing the inputs of RAM circuits and MainVM circuits. Fixing those could solve all our memory trouble.

Even with every circuit type incrementalized, an array of all ClosedFormInputCompactFormWitnesses is collected. I don't know how hard it is to get rid of that or if it is big enough to be problematic.

@joonazan joonazan force-pushed the jms-fix-memory-use branch 4 times, most recently from 9b2bcc9 to 997463c Compare January 12, 2024 08:59
@joonazan joonazan force-pushed the jms-fix-memory-use branch 3 times, most recently from a8a88e9 to 70351ec Compare January 24, 2024 18:02
The destination of replace isn't used later, so setting it to an
empty Vec does nothing.
Still needs further cleanup. This version doesn't reduce memory use much,
the focus here is on changing the API.

I had to hardcode GoldilocksField instead of parameterizing everything by field.
The trait bounds required were ridiculous and caused 30 min long typechecking.
I had assumed that InstanceWitness -> circuit type is a proper function. It is not.
@joonazan joonazan changed the base branch from v1.4.0 to v1.4.1 January 25, 2024 10:24
@joonazan joonazan marked this pull request as ready for review January 25, 2024 16:43
@joonazan joonazan requested review from jules and shamatar January 25, 2024 16:43
@joonazan
Copy link
Contributor Author

I get the same failures and timeouts when running the tests before and after this PR, so I guess it passes...

@shamatar
Copy link
Member

We should rollout it on staging, generate witness for the full block, and basically compare them (those are serializable structures, so just compare bytes)

@joonazan
Copy link
Contributor Author

joonazan commented Jan 25, 2024

@shamatar A note about comparing witnesses: simple diffing will indicate they have changed because the circuits are in a valid but different order. I used the following script to diff circuits.

import os, os.path, filecmp

def sorted_files(dir):
    return map(lambda x: os.path.join(dir, x), sorted(os.listdir(dir), key=file_order))

def file_order(filename):
    [a, b, c, *r] = os.path.basename(filename).split("_")
    return (int(a), int(c), int(b))

for a, b in zip(sorted_files("reference_artifacts/prover_jobs_fri"), sorted_files("artifacts/prover_jobs_fri")):
    if not filecmp.cmp(a, b):
        print("mismatch", a, b)

We'll do comprehensive testing next week. I'm unavailable tomorrow.

I am very confident that the output of this version is identical to 1.4.1 in all other respects.

github-merge-queue bot pushed a commit to matter-labs/zksync-era that referenced this pull request Jan 31, 2024
## What ❔

Uses an [updated
zkevm_test_harness](matter-labs/era-zkevm_test_harness#57)
that uses less memory. The updated crate takes callback functions
instead of returning a vector of all results.

We may have to think about how to not keep the database open the whole
time the callbacks are coming in.

## Why ❔

Proving batches currently uses memory proportional to the batch size,
which is not ok for large batches.

---------

Co-authored-by: Fedor Sakharov <fedor.sakharov@gmail.com>
Co-authored-by: AntonD3 <74021421+AntonD3@users.noreply.github.com>
Co-authored-by: Stanislav Breadless <stanislavbezkor@gmail.com>
Co-authored-by: perekopskiy <53865202+perekopskiy@users.noreply.github.com>
Co-authored-by: pompon0 <pompon.pompon@gmail.com>
Co-authored-by: Dustin Brickwood <dustinbrickwood204@gmail.com>
Co-authored-by: Igor Borodin <hatemosphere@protonmail.com>
@joonazan joonazan merged commit badf56d into v1.4.1 Jan 31, 2024
4 checks passed
@mm-zk mm-zk deleted the jms-fix-memory-use branch April 5, 2024 12:54
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants