-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(prover): Disallow state changes from successful #2233
Conversation
This PR is done as a fix for boojnet outage. TL;DR; of outage -- race condition caused by prover jobs moving from 'successful` state to `in_progress`/`in_gpu_proving`. The PR addresses: - no job can move from successful state (considered final state) - fix local development (contracts were pointing to 0.24.0 instead of 0.24.1) -- can be split to a different PR, if this is problematic. - add table constraint -- again, can be split in different PR - add checks for recursion_tip number of jobs (post outage check, should not happen ever, but better to verify)
i64::from(block_number.0) | ||
i64::from(block_number.0), | ||
ProofCompressionJobStatus::Successful.to_string(), | ||
ProofCompressionJobStatus::SentToServer.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it would be worth it to log such occurances. This would make it easier to debug potential additional occurances.
You can make update
return the number of affected rows and then compare with 1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, but we can already tell this by number of attempts in DB. Not opposing the idea.
🤖 I have created a release *beep* *boop* --- ## [15.0.0](prover-v14.5.0...prover-v15.0.0) (2024-06-14) ### ⚠ BREAKING CHANGES * updated boojum and nightly rust compiler ([#2126](#2126)) ### Features * added debug_proof to prover_cli ([#2052](#2052)) ([b1ad01b](b1ad01b)) * faster & cleaner VK generation ([#2084](#2084)) ([89c8cac](89c8cac)) * **node:** Move some stuff around ([#2151](#2151)) ([bad5a6c](bad5a6c)) * **object-store:** Allow caching object store objects locally ([#2153](#2153)) ([6c6e65c](6c6e65c)) * **proof_data_handler:** add new endpoints to the TEE prover interface API ([#1993](#1993)) ([eca98cc](eca98cc)) * **prover:** Add file based config for fri prover gateway ([#2150](#2150)) ([81ffc6a](81ffc6a)) * **prover:** file based configs for witness generator ([#2161](#2161)) ([24b8f93](24b8f93)) * support debugging of recursive circuits in prover_cli ([#2217](#2217)) ([7d2e12d](7d2e12d)) * updated boojum and nightly rust compiler ([#2126](#2126)) ([9e39f13](9e39f13)) * verification of L1Batch witness (BFT-471) - attempt 2 ([#2232](#2232)) ([dbcf3c6](dbcf3c6)) * verification of L1Batch witness (BFT-471) ([#2019](#2019)) ([6cc5455](6cc5455)) ### Bug Fixes * **config:** Split object stores ([#2187](#2187)) ([9bcdabc](9bcdabc)) * **prover_cli:** Fix delete command ([#2119](#2119)) ([214f981](214f981)) * **prover_cli:** Fix the issues with `home` path ([#2104](#2104)) ([1e18af2](1e18af2)) * **prover:** config ([#2165](#2165)) ([e5daf8e](e5daf8e)) * **prover:** Disallow state changes from successful ([#2233](#2233)) ([2488a76](2488a76)) * Treat 502s and 503s as transient for GCS OS ([#2202](#2202)) ([0a12c52](0a12c52)) ### Reverts * verification of L1Batch witness (BFT-471) ([#2230](#2230)) ([227e101](227e101)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
🤖 I have created a release *beep* *boop* --- ## [24.8.0](core-v24.7.0...core-v24.8.0) (2024-06-24) ### ⚠ BREAKING CHANGES * updated boojum and nightly rust compiler ([#2126](#2126)) ### Features * Add metrics for transaction execution result in state keeper ([#2021](#2021)) ([dde0fc4](dde0fc4)) * **api:** Add new `l1_committed` block tag ([#2282](#2282)) ([d5e8e9b](d5e8e9b)) * **api:** Rework zks_getProtocolVersion ([#2146](#2146)) ([800b8f4](800b8f4)) * change `zkSync` occurences to `ZKsync` ([#2227](#2227)) ([0b4104d](0b4104d)) * **contract-verifier:** Adjust contract verifier for zksolc 1.5.0 ([#2255](#2255)) ([63efb2e](63efb2e)) * **docs:** Add documentation for subset of wiring layer implementations, used by Main node ([#2292](#2292)) ([06c287b](06c287b)) * **docs:** Pruning and Snapshots recovery basic docs ([#2265](#2265)) ([619a525](619a525)) * **en:** Allow recovery from specific snapshot ([#2137](#2137)) ([ac61fed](ac61fed)) * **eth-sender:** fix for missing eth_txs_history entries ([#2236](#2236)) ([f05b0ae](f05b0ae)) * Expose fair_pubdata_price for blocks and batches ([#2244](#2244)) ([0d51cd6](0d51cd6)) * **merkle-tree:** Rework tree rollback ([#2207](#2207)) ([c3b9c38](c3b9c38)) * **node-framework:** Add Main Node Client layer ([#2132](#2132)) ([927d842](927d842)) * **node:** Move some stuff around ([#2151](#2151)) ([bad5a6c](bad5a6c)) * **node:** Port (most of) Node to the Node Framework ([#2196](#2196)) ([7842bc4](7842bc4)) * **object-store:** Allow caching object store objects locally ([#2153](#2153)) ([6c6e65c](6c6e65c)) * **proof_data_handler:** add new endpoints to the TEE prover interface API ([#1993](#1993)) ([eca98cc](eca98cc)) * **prover:** Add file based config for fri prover gateway ([#2150](#2150)) ([81ffc6a](81ffc6a)) * Remove initialize_components function ([#2284](#2284)) ([0a38891](0a38891)) * **state-keeper:** Add metric for l2 block seal reason ([#2229](#2229)) ([f967e6d](f967e6d)) * **state-keeper:** More state keeper metrics ([#2224](#2224)) ([1e48cd9](1e48cd9)) * **sync-layer:** adapt MiniMerkleTree to manage priority queue ([#2068](#2068)) ([3e72364](3e72364)) * **tee_verifier_input_producer:** use `FactoryDepsDal::get_factory_deps() ([#2271](#2271)) ([2c0a00a](2c0a00a)) * **toolbox:** add zk_toolbox ci ([#1985](#1985)) ([4ab4922](4ab4922)) * updated boojum and nightly rust compiler ([#2126](#2126)) ([9e39f13](9e39f13)) * upgraded encoding of transactions in consensus Payload. ([#2245](#2245)) ([cb6a6c8](cb6a6c8)) * Use info log level for crates named zksync_* by default ([#2296](#2296)) ([9303142](9303142)) * verification of L1Batch witness (BFT-471) - attempt 2 ([#2232](#2232)) ([dbcf3c6](dbcf3c6)) * verification of L1Batch witness (BFT-471) ([#2019](#2019)) ([6cc5455](6cc5455)) * **vm-runner:** add basic metrics ([#2203](#2203)) ([dd154f3](dd154f3)) * **vm-runner:** add protective reads persistence flag for state keeper ([#2307](#2307)) ([36d2eb6](36d2eb6)) * **vm-runner:** shadow protective reads using VM runner ([#2017](#2017)) ([1402dd0](1402dd0)) ### Bug Fixes * **api:** Fix getting pending block ([#2186](#2186)) ([93315ba](93315ba)) * **api:** Fix transaction methods for pruned transactions ([#2168](#2168)) ([00c4cca](00c4cca)) * **config:** Fix object store ([#2183](#2183)) ([551cdc2](551cdc2)) * **config:** Split object stores ([#2187](#2187)) ([9bcdabc](9bcdabc)) * **db:** Fix `insert_proof_generation_details()` ([#2291](#2291)) ([c2412cf](c2412cf)) * **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch` ([#2199](#2199)) ([06ec5f3](06ec5f3)) * **en:** Fix reorg detection in presence of tree data fetcher ([#2197](#2197)) ([20da566](20da566)) * **en:** Fix transient error detection in consistency checker ([#2140](#2140)) ([38fdfe0](38fdfe0)) * **en:** Remove L1 client health check ([#2136](#2136)) ([49198f6](49198f6)) * **eth-sender:** Don't resend already sent transactions in the same block ([#2208](#2208)) ([3538e9c](3538e9c)) * **eth-sender:** etter error handling in eth-sender ([#2163](#2163)) ([0cad504](0cad504)) * **node_framework:** Run gas adjuster task only if necessary ([#2266](#2266)) ([2dac846](2dac846)) * **object-store:** Consider more GCS errors transient ([#2246](#2246)) ([2f6cd41](2f6cd41)) * **prover_cli:** Remove outdated fix for circuit id in node wg ([#2248](#2248)) ([db8e71b](db8e71b)) * **prover:** Disallow state changes from successful ([#2233](#2233)) ([2488a76](2488a76)) * **pruning:** Check pruning in metadata calculator ([#2286](#2286)) ([7bd8f27](7bd8f27)) * Treat 502s and 503s as transient for GCS OS ([#2202](#2202)) ([0a12c52](0a12c52)) * **vm-runner:** add config value for the first processed batch ([#2158](#2158)) ([f666717](f666717)) * **vm-runner:** make `last_ready_batch` account for `first_processed_batch` ([#2238](#2238)) ([3889794](3889794)) * **vm:** fix insertion to `decommitted_code_hashes` ([#2275](#2275)) ([15bb71e](15bb71e)) * **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit` ([#2253](#2253)) ([6c49a50](6c49a50)) ### Performance Improvements * **db:** Improve storage switching for state keeper cache ([#2234](#2234)) ([7c8e24c](7c8e24c)) * **db:** Try yet another storage log pruning approach ([#2268](#2268)) ([3ee34be](3ee34be)) * **en:** Parallelize persistence and chunk processing during tree recovery ([#2050](#2050)) ([b08a667](b08a667)) * **pruning:** Use more efficient query to delete past storage logs ([#2179](#2179)) ([4c18755](4c18755)) ### Reverts * **pruning:** Revert pruning query ([#2220](#2220)) ([8427cdd](8427cdd)) * verification of L1Batch witness (BFT-471) ([#2230](#2230)) ([227e101](227e101)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <zksync-era-bot@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [24.8.0](matter-labs/zksync-era@core-v24.7.0...core-v24.8.0) (2024-06-24) ### ⚠ BREAKING CHANGES * updated boojum and nightly rust compiler ([matter-labs#2126](matter-labs#2126)) ### Features * Add metrics for transaction execution result in state keeper ([matter-labs#2021](matter-labs#2021)) ([dde0fc4](matter-labs@dde0fc4)) * **api:** Add new `l1_committed` block tag ([matter-labs#2282](matter-labs#2282)) ([d5e8e9b](matter-labs@d5e8e9b)) * **api:** Rework zks_getProtocolVersion ([matter-labs#2146](matter-labs#2146)) ([800b8f4](matter-labs@800b8f4)) * change `zkSync` occurences to `ZKsync` ([matter-labs#2227](matter-labs#2227)) ([0b4104d](matter-labs@0b4104d)) * **contract-verifier:** Adjust contract verifier for zksolc 1.5.0 ([matter-labs#2255](matter-labs#2255)) ([63efb2e](matter-labs@63efb2e)) * **docs:** Add documentation for subset of wiring layer implementations, used by Main node ([matter-labs#2292](matter-labs#2292)) ([06c287b](matter-labs@06c287b)) * **docs:** Pruning and Snapshots recovery basic docs ([matter-labs#2265](matter-labs#2265)) ([619a525](matter-labs@619a525)) * **en:** Allow recovery from specific snapshot ([matter-labs#2137](matter-labs#2137)) ([ac61fed](matter-labs@ac61fed)) * **eth-sender:** fix for missing eth_txs_history entries ([matter-labs#2236](matter-labs#2236)) ([f05b0ae](matter-labs@f05b0ae)) * Expose fair_pubdata_price for blocks and batches ([matter-labs#2244](matter-labs#2244)) ([0d51cd6](matter-labs@0d51cd6)) * **merkle-tree:** Rework tree rollback ([matter-labs#2207](matter-labs#2207)) ([c3b9c38](matter-labs@c3b9c38)) * **node-framework:** Add Main Node Client layer ([matter-labs#2132](matter-labs#2132)) ([927d842](matter-labs@927d842)) * **node:** Move some stuff around ([matter-labs#2151](matter-labs#2151)) ([bad5a6c](matter-labs@bad5a6c)) * **node:** Port (most of) Node to the Node Framework ([matter-labs#2196](matter-labs#2196)) ([7842bc4](matter-labs@7842bc4)) * **object-store:** Allow caching object store objects locally ([matter-labs#2153](matter-labs#2153)) ([6c6e65c](matter-labs@6c6e65c)) * **proof_data_handler:** add new endpoints to the TEE prover interface API ([matter-labs#1993](matter-labs#1993)) ([eca98cc](matter-labs@eca98cc)) * **prover:** Add file based config for fri prover gateway ([matter-labs#2150](matter-labs#2150)) ([81ffc6a](matter-labs@81ffc6a)) * Remove initialize_components function ([matter-labs#2284](matter-labs#2284)) ([0a38891](matter-labs@0a38891)) * **state-keeper:** Add metric for l2 block seal reason ([matter-labs#2229](matter-labs#2229)) ([f967e6d](matter-labs@f967e6d)) * **state-keeper:** More state keeper metrics ([matter-labs#2224](matter-labs#2224)) ([1e48cd9](matter-labs@1e48cd9)) * **sync-layer:** adapt MiniMerkleTree to manage priority queue ([matter-labs#2068](matter-labs#2068)) ([3e72364](matter-labs@3e72364)) * **tee_verifier_input_producer:** use `FactoryDepsDal::get_factory_deps() ([matter-labs#2271](matter-labs#2271)) ([2c0a00a](matter-labs@2c0a00a)) * **toolbox:** add zk_toolbox ci ([matter-labs#1985](matter-labs#1985)) ([4ab4922](matter-labs@4ab4922)) * updated boojum and nightly rust compiler ([matter-labs#2126](matter-labs#2126)) ([9e39f13](matter-labs@9e39f13)) * upgraded encoding of transactions in consensus Payload. ([matter-labs#2245](matter-labs#2245)) ([cb6a6c8](matter-labs@cb6a6c8)) * Use info log level for crates named zksync_* by default ([matter-labs#2296](matter-labs#2296)) ([9303142](matter-labs@9303142)) * verification of L1Batch witness (BFT-471) - attempt 2 ([matter-labs#2232](matter-labs#2232)) ([dbcf3c6](matter-labs@dbcf3c6)) * verification of L1Batch witness (BFT-471) ([matter-labs#2019](matter-labs#2019)) ([6cc5455](matter-labs@6cc5455)) * **vm-runner:** add basic metrics ([matter-labs#2203](matter-labs#2203)) ([dd154f3](matter-labs@dd154f3)) * **vm-runner:** add protective reads persistence flag for state keeper ([matter-labs#2307](matter-labs#2307)) ([36d2eb6](matter-labs@36d2eb6)) * **vm-runner:** shadow protective reads using VM runner ([matter-labs#2017](matter-labs#2017)) ([1402dd0](matter-labs@1402dd0)) ### Bug Fixes * **api:** Fix getting pending block ([matter-labs#2186](matter-labs#2186)) ([93315ba](matter-labs@93315ba)) * **api:** Fix transaction methods for pruned transactions ([matter-labs#2168](matter-labs#2168)) ([00c4cca](matter-labs@00c4cca)) * **config:** Fix object store ([matter-labs#2183](matter-labs#2183)) ([551cdc2](matter-labs@551cdc2)) * **config:** Split object stores ([matter-labs#2187](matter-labs#2187)) ([9bcdabc](matter-labs@9bcdabc)) * **db:** Fix `insert_proof_generation_details()` ([matter-labs#2291](matter-labs#2291)) ([c2412cf](matter-labs@c2412cf)) * **db:** Optimize `get_l2_blocks_to_execute_for_l1_batch` ([matter-labs#2199](matter-labs#2199)) ([06ec5f3](matter-labs@06ec5f3)) * **en:** Fix reorg detection in presence of tree data fetcher ([matter-labs#2197](matter-labs#2197)) ([20da566](matter-labs@20da566)) * **en:** Fix transient error detection in consistency checker ([matter-labs#2140](matter-labs#2140)) ([38fdfe0](matter-labs@38fdfe0)) * **en:** Remove L1 client health check ([matter-labs#2136](matter-labs#2136)) ([49198f6](matter-labs@49198f6)) * **eth-sender:** Don't resend already sent transactions in the same block ([matter-labs#2208](matter-labs#2208)) ([3538e9c](matter-labs@3538e9c)) * **eth-sender:** etter error handling in eth-sender ([matter-labs#2163](matter-labs#2163)) ([0cad504](matter-labs@0cad504)) * **node_framework:** Run gas adjuster task only if necessary ([matter-labs#2266](matter-labs#2266)) ([2dac846](matter-labs@2dac846)) * **object-store:** Consider more GCS errors transient ([matter-labs#2246](matter-labs#2246)) ([2f6cd41](matter-labs@2f6cd41)) * **prover_cli:** Remove outdated fix for circuit id in node wg ([matter-labs#2248](matter-labs#2248)) ([db8e71b](matter-labs@db8e71b)) * **prover:** Disallow state changes from successful ([matter-labs#2233](matter-labs#2233)) ([2488a76](matter-labs@2488a76)) * **pruning:** Check pruning in metadata calculator ([matter-labs#2286](matter-labs#2286)) ([7bd8f27](matter-labs@7bd8f27)) * Treat 502s and 503s as transient for GCS OS ([matter-labs#2202](matter-labs#2202)) ([0a12c52](matter-labs@0a12c52)) * **vm-runner:** add config value for the first processed batch ([matter-labs#2158](matter-labs#2158)) ([f666717](matter-labs@f666717)) * **vm-runner:** make `last_ready_batch` account for `first_processed_batch` ([matter-labs#2238](matter-labs#2238)) ([3889794](matter-labs@3889794)) * **vm:** fix insertion to `decommitted_code_hashes` ([matter-labs#2275](matter-labs#2275)) ([15bb71e](matter-labs@15bb71e)) * **vm:** Update `decommitted_code_hashes` in `prepare_to_decommit` ([matter-labs#2253](matter-labs#2253)) ([6c49a50](matter-labs@6c49a50)) ### Performance Improvements * **db:** Improve storage switching for state keeper cache ([matter-labs#2234](matter-labs#2234)) ([7c8e24c](matter-labs@7c8e24c)) * **db:** Try yet another storage log pruning approach ([matter-labs#2268](matter-labs#2268)) ([3ee34be](matter-labs@3ee34be)) * **en:** Parallelize persistence and chunk processing during tree recovery ([matter-labs#2050](matter-labs#2050)) ([b08a667](matter-labs@b08a667)) * **pruning:** Use more efficient query to delete past storage logs ([matter-labs#2179](matter-labs#2179)) ([4c18755](matter-labs@4c18755)) ### Reverts * **pruning:** Revert pruning query ([matter-labs#2220](matter-labs#2220)) ([8427cdd](matter-labs@8427cdd)) * verification of L1Batch witness (BFT-471) ([matter-labs#2230](matter-labs#2230)) ([227e101](matter-labs@227e101)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). --------- Co-authored-by: zksync-era-bot <zksync-era-bot@users.noreply.github.com>
This PR is done as a fix for boojnet outage.
TL;DR; of outage -- race condition caused by prover jobs moving from 'successful
state to
in_progress/
in_gpu_proving`.The PR addresses: