-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Returning messageVisibilityInterval always from commit roots cache #1155
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a955fc5
to
4ff6465
Compare
dimkouv
requested changes
Jul 8, 2024
4ff6465
to
96233e8
Compare
4cb1a8b
to
bd9f251
Compare
dimkouv
approved these changes
Jul 8, 2024
connorwstein
approved these changes
Jul 8, 2024
mateusz-sekara
added a commit
that referenced
this pull request
Jul 8, 2024
…1155) Commit's Roots cache stores block_timestamps for the CommitReports, but when reorgs happens block_timestamp can also change because TX could be included in a different block. Example scenario: CommitRoot with 10:30:00 UTC block timestamp is inserted with `AppendUnexecutedRoot` (so it's block_timestamp is persisted in the cache). Then reorg happens on the destination and the same CommitRoot is inserted into the blockchain but let's say with 10:29:00 UTC (different block_time), which is never updated. Commit Roots cache returns persisted 10:30:00 as the oldest block timestamp and we keep searching LogPoller using the wrong lower bound filter - this Commit Root never pops up in the execution because it's never returned from DB
jarnaud
pushed a commit
that referenced
this pull request
Jul 9, 2024
…1155) ## Motivation Commit's Roots cache stores block_timestamps for the CommitReports, but when reorgs happens block_timestamp can also change because TX could be included in a different block. Example scenario: CommitRoot with 10:30:00 UTC block timestamp is inserted with `AppendUnexecutedRoot` (so it's block_timestamp is persisted in the cache). Then reorg happens on the destination and the same CommitRoot is inserted into the blockchain but let's say with 10:29:00 UTC (different block_time), which is never updated. Commit Roots cache returns persisted 10:30:00 as the oldest block timestamp and we keep searching LogPoller using the wrong lower bound filter - this Commit Root never pops up in the execution because it's never returned from DB
This was referenced Jul 10, 2024
mateusz-sekara
added a commit
that referenced
this pull request
Jul 31, 2024
) ## Motivation Detailed context is here #726, but long story short scanning the entire `permissionLessExecThreshold` / `messageIntervalVisibility` from every Commit plugin instance (think of it as a lane) puts a lot of pressure to the database. For most of the cases, we don't need the entire window but only Roots that are not executed yet. Unfortunately, the solution suggested in the mentioned PR was faulty because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on Polygon exposed that problem, details and revert here #1155 ## Solution Divide the cache into two layers * finalized commit roots are kept in memory because we assume these never change * unfinalized commit roots are fetched from LogPoller, and finalized logs are promoted to the finalized part There should be no additional memory footprint because we kept all executed and finalized roots in cache memory anyway. The performance impact of this approach should be similar to the #726 , because we keep scanning only unfinalized chunks of the LogPoller so the number of logs fetched from the database would be very limited. However, there is one caveat here, LogPoller doesn't mark logs as finalized/not finalized by default so we need to ask for the latest finalized block in the Commit Reader directly and enrich logs with that information (similar approach to what we do for Offramp reader). Thankfully, asking the database for the latest log is extremely fast and (if needed) could be optimized by caching latest finalized block in the LogPoller/HeadTracker. There will be load tests conducted here to confirm performance assumptions. Please see the code and tests for more details chainlink-common PR smartcontractkit/chainlink-common#650 ### High level picture ![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a) ### Alternative solutions: [I've tried implementing the solution in which the cache is simplified and keeps only executed/snoozed roots](#1170). Snoozed Roots are passed to the database query and used for filtering out executed and finalized roots. However, it works only for some number of snoozed roots. After adding more than 250 snoozed roots to the LogPoller query, the driver/database started to cut off some of the filters making it unreliable (erroring or not filtering properly). Also, there is a risk of hitting the Postgres query size limit (because every root has to be passed in SQL) which might lead to some limbo state which the plugin can't recover from. ## Tests ### Before - fetching all CommitReports always <img width="1293" alt="image" src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7"> ### After - fetching only unfinalized CommitReports and caching finalized ones <img width="1283" alt="image" src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970"> ### Summary Caching roots improved the performance, query is smaller in terms of the objects scanned and returned. Number of roots returned is constant almost during the entire test (around ~5 reports at once) Sightly higher memory footprint (around 800MB higher) but lower CPU utilization during tests (~15% drop) <img width="2500" alt="image" src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
asoliman92
pushed a commit
that referenced
this pull request
Jul 31, 2024
…1155) ## Motivation Commit's Roots cache stores block_timestamps for the CommitReports, but when reorgs happens block_timestamp can also change because TX could be included in a different block. Example scenario: CommitRoot with 10:30:00 UTC block timestamp is inserted with `AppendUnexecutedRoot` (so it's block_timestamp is persisted in the cache). Then reorg happens on the destination and the same CommitRoot is inserted into the blockchain but let's say with 10:29:00 UTC (different block_time), which is never updated. Commit Roots cache returns persisted 10:30:00 as the oldest block timestamp and we keep searching LogPoller using the wrong lower bound filter - this Commit Root never pops up in the execution because it's never returned from DB
asoliman92
pushed a commit
that referenced
this pull request
Jul 31, 2024
) ## Motivation Detailed context is here #726, but long story short scanning the entire `permissionLessExecThreshold` / `messageIntervalVisibility` from every Commit plugin instance (think of it as a lane) puts a lot of pressure to the database. For most of the cases, we don't need the entire window but only Roots that are not executed yet. Unfortunately, the solution suggested in the mentioned PR was faulty because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on Polygon exposed that problem, details and revert here #1155 ## Solution Divide the cache into two layers * finalized commit roots are kept in memory because we assume these never change * unfinalized commit roots are fetched from LogPoller, and finalized logs are promoted to the finalized part There should be no additional memory footprint because we kept all executed and finalized roots in cache memory anyway. The performance impact of this approach should be similar to the #726 , because we keep scanning only unfinalized chunks of the LogPoller so the number of logs fetched from the database would be very limited. However, there is one caveat here, LogPoller doesn't mark logs as finalized/not finalized by default so we need to ask for the latest finalized block in the Commit Reader directly and enrich logs with that information (similar approach to what we do for Offramp reader). Thankfully, asking the database for the latest log is extremely fast and (if needed) could be optimized by caching latest finalized block in the LogPoller/HeadTracker. There will be load tests conducted here to confirm performance assumptions. Please see the code and tests for more details chainlink-common PR smartcontractkit/chainlink-common#650 ### High level picture ![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a) ### Alternative solutions: [I've tried implementing the solution in which the cache is simplified and keeps only executed/snoozed roots](#1170). Snoozed Roots are passed to the database query and used for filtering out executed and finalized roots. However, it works only for some number of snoozed roots. After adding more than 250 snoozed roots to the LogPoller query, the driver/database started to cut off some of the filters making it unreliable (erroring or not filtering properly). Also, there is a risk of hitting the Postgres query size limit (because every root has to be passed in SQL) which might lead to some limbo state which the plugin can't recover from. ## Tests ### Before - fetching all CommitReports always <img width="1293" alt="image" src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7"> ### After - fetching only unfinalized CommitReports and caching finalized ones <img width="1283" alt="image" src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970"> ### Summary Caching roots improved the performance, query is smaller in terms of the objects scanned and returned. Number of roots returned is constant almost during the entire test (around ~5 reports at once) Sightly higher memory footprint (around 800MB higher) but lower CPU utilization during tests (~15% drop) <img width="2500" alt="image" src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation
Commit's Roots cache stores block_timestamps for the CommitReports, but when reorgs happens block_timestamp can also change because TX could be included in a different block.
Example scenario:
CommitRoot with 10:30:00 UTC block timestamp is inserted with
AppendUnexecutedRoot
(so it's block_timestamp is persisted in the cache). Then reorg happens on the destination and the same CommitRoot is inserted into the blockchain but let's say with 10:29:00 UTC (different block_time), which is never updated. Commit Roots cache returns persisted 10:30:00 as the oldest block timestamp and we keep searching LogPoller using the wrong lower bound filter - this Commit Root never pops up in the execution because it's never returned from DB