-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(proof_data_handler): Unlock jobs on transient errors #2486
Conversation
🤖 I have created a release *beep* *boop* --- ## [24.13.0](core-v24.12.0...core-v24.13.0) (2024-07-31) ### Features * Add recovery tests to zk_supervisor ([#2444](#2444)) ([0c0d10a](0c0d10a)) * Added a JSON RPC to simulating L1 for consensus attestation ([#2480](#2480)) ([c6b3adf](c6b3adf)) * added dropping all attester certificates when doing hard fork ([#2529](#2529)) ([5acd686](5acd686)) * **configs:** Do not panic if config is only partially filled ([#2545](#2545)) ([db13fe3](db13fe3)) * **eth-sender:** Make eth-sender tests use blob txs + refactor of eth-sender tests ([#2316](#2316)) ([c8c8334](c8c8334)) * Introduce more tracing instrumentation ([#2523](#2523)) ([79d407a](79d407a)) * Remove unused VKs, add docs for BWG ([#2468](#2468)) ([2fa6bf0](2fa6bf0)) * Revisit base config values ([#2532](#2532)) ([3fac8ac](3fac8ac)) * Server 10k gwei limit on gas price and 1M limit on pubdata price ([#2460](#2460)) ([be238cc](be238cc)) * **vlog:** Implement otlp guard with force flush on drop ([#2536](#2536)) ([c9f76e5](c9f76e5)) * **vlog:** New vlog interface + opentelemtry improvements ([#2472](#2472)) ([c0815cd](c0815cd)) * **zk_toolbox:** add test upgrade subcommand to zk_toolbox ([#2515](#2515)) ([1a12f5f](1a12f5f)) * **zk_toolbox:** use configs from the main repo ([#2470](#2470)) ([4222d13](4222d13)) ### Bug Fixes * **contract verifier:** Fix config values ([#2510](#2510)) ([3729468](3729468)) * fixed panic propagation ([#2525](#2525)) ([e0fc58b](e0fc58b)) * **proof_data_handler:** Unlock jobs on transient errors ([#2486](#2486)) ([7c336b1](7c336b1)) * **prover:** Parallelize circuit metadata uploading for BWG ([#2520](#2520)) ([f49720f](f49720f)) * VM performance diff: don't show 0 as N/A ([#2276](#2276)) ([2fa2249](2fa2249)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <zksync-era-bot@users.noreply.github.com>
@popzxc I think you forgot to update this diagram. I prepared a PR fixing it: #2722. BTW, I might be repeating myself, but unlike @perekopskiy, I think we should use an |
Also, cherry-pick some of the ideas introduced in #2486
Also, cherry-pick some of the ideas introduced in #2486
It should have been updated as part of these 2 PRs: - #2258 - #2486 ## What ❔ Update ProofGenerationDal docs chart. ## Why ❔ We like up-to-date docs. ## Checklist - [x] PR title corresponds to the body of PR (we generate changelog entries from PRs). - [ ] Tests for the changes have been added / updated. - [x] Documentation comments have been added / updated. - [x] Code has been formatted via `zk fmt` and `zk lint`.
@pbeza thanks for looking into it! Regarding enums and stronger typing: I don't think @perekopskiy meant that "enums bad, strong typing bad" -- he raised a valid question at the beginning -- what's the cost of this solution? For the prover DB it's even more important. A few immediate things that come to mind:
On the other hand, what are the drawbacks of having it in text? The main one is the possibility of typo/error/using non-existent status. It is a valid concern, but in that particular case it's easily discoverable (each typo will inevitably cause the prover to deadlock). Moreover, we have DAL as a layer of isolation, so the surface for bugs is pretty narrow. Just in case, I as well don't want to say "enums bad, strong typing bad". What I want to say is that everything is a tradeoff and the answer is usually "it depends". If you have strong opinions -- you are more than welcome to write a design doc, compare pros/cons of both approaches, and present it -- you will be heard, and your proposal will be discussed. But it must come with a good motivation to change something that works and works well (even if not fully idiomatic). |
Thanks for the detailed explnation! Just to clarify my PoV: I think you might be overlooking the human factor in coding. You, @popzxc, and @perekopskiy are likely very familiar and comfortable with the codebase, aware of many design decisions made over the last months or years. However, if someone is new to the code, having One of the key reasons why using That said, I get why you went with Anyway, thanks for the thoughtful discussion! ⚔️ |
What ❔
Currently, proof data handler locks the job for proof generation, and starts fetching required data after that.
If any error happens during fetching, the method will err, and the job will remain locked.
This PR changes it, so that if any error occurs, we unlock the job before we return an error.
Additionally, it reduces the amount of non-necessary panics in the touched code, and adds some docs.
Why ❔
Correctness & efficiency.
Checklist
zk fmt
andzk lint
.