-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CCIP-2003 Improved fetching Commit Reports from database #726
Conversation
I see you updated files related to |
ca6249a
to
3bb3ce1
Compare
3bb3ce1
to
bfd09a1
Compare
bfd09a1
to
5baa30c
Compare
// Now the search filter wil be 0xA timestamp -> [2010-10-11, 20-10-15] | ||
// If roots are executed out of order, it's not going to change anything. However, for most of the cases we have sequential root execution and that is | ||
// a huge improvement because we don't need to fetch all the roots from the database in every round. | ||
unexecutedRootsQueue *orderedmap.OrderedMap[string, time.Time] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer not to use this lib.
We could have our own simple implementation or use a different type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we have our own simple implementation you can also have the rootsQueueMu
defined there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was already in the deps so I've decided to reuse it. Do you have any better ideas? I'm not sure about reimplementing, sounds like reinventing the wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternative implementation would be this https://github.com/elliotchance/orderedmap (more GH stars)
Current implementation fetches all the CommitReports within the permissionlessExecThreshold from database and filter out result set based on the `snoozedRoots`. It means that when everything is executed, we keep fetching all roots to memory only to discover that we don't need that data. During high traffic spikes, this is an unnecessary waste of resources in the CL node and database as the dataset size grows in time till it reaches permissionlessExec window (8 hours of logs for prod). Example load run in which you can see how it changes over time during steady traffic of messages <img width="1488" alt="image" src="https://github.com/smartcontractkit/ccip/assets/18304098/7b216dd4-1ab3-4943-9797-99a38f4f9eaa"> There are at least two solutions to make it more performant 1. Pass executed roots to the database query to fetch only unexpired roots from the database. Unexpired roots can be determined from the snoozedRoots cache. This is probably the most accurate approach because it always fetches the minimal required set from the database. You can think of the query like this ```sql select * from evm.logs where address = :addr and event_sig = :event_sig and substring(data, X, Y) not in (:executed_roots_array) and block_timestamp > :permissionlessThresholdWindow ``` 2. Make permissionlessThreshold filter "sliding". Commit Reports are executed sequentially for most of the cases, so we can keep track of timestamps of the oldest fully executed report. Instead of inventing a new query for LogPoller we can achieve a lot based on the assumption that execution is sequential. Whenever report is markedAsExecuted we keep bumping block_timestamp filter. We already have that information in cache and indirectly rely on that when filtering out snoozedRoots ```sql and block_timestamp > :oldest_executed_timestamp ``` Solution 2 is easier to implement, doesn't require any changes to LogPoller (and query perf testing) and can be handled in the plugin. We already cache snoozedRoots so we can reuse that to move filtering to the database by narrowing the block_timestamp search window. Assuming that reports are executed sequentially, it should be as accurate as solution 1. duration) <img width="1088" alt="image" src="https://github.com/smartcontractkit/ccip/assets/18304098/c445495d-51e7-4138-a00b-662423fa0cdf">
* Adding evm_chain_id, address and event_sig to data and topic indexes smartcontractkit/chainlink#12786 * Improved fetching Commit Reports from database #726
) ## Motivation Detailed context is here #726, but long story short scanning the entire `permissionLessExecThreshold` / `messageIntervalVisibility` from every Commit plugin instance (think of it as a lane) puts a lot of pressure to the database. For most of the cases, we don't need the entire window but only Roots that are not executed yet. Unfortunately, the solution suggested in the mentioned PR was faulty because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on Polygon exposed that problem, details and revert here #1155 ## Solution Divide the cache into two layers * finalized commit roots are kept in memory because we assume these never change * unfinalized commit roots are fetched from LogPoller, and finalized logs are promoted to the finalized part There should be no additional memory footprint because we kept all executed and finalized roots in cache memory anyway. The performance impact of this approach should be similar to the #726 , because we keep scanning only unfinalized chunks of the LogPoller so the number of logs fetched from the database would be very limited. However, there is one caveat here, LogPoller doesn't mark logs as finalized/not finalized by default so we need to ask for the latest finalized block in the Commit Reader directly and enrich logs with that information (similar approach to what we do for Offramp reader). Thankfully, asking the database for the latest log is extremely fast and (if needed) could be optimized by caching latest finalized block in the LogPoller/HeadTracker. There will be load tests conducted here to confirm performance assumptions. Please see the code and tests for more details chainlink-common PR smartcontractkit/chainlink-common#650 ### High level picture ![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a) ### Alternative solutions: [I've tried implementing the solution in which the cache is simplified and keeps only executed/snoozed roots](#1170). Snoozed Roots are passed to the database query and used for filtering out executed and finalized roots. However, it works only for some number of snoozed roots. After adding more than 250 snoozed roots to the LogPoller query, the driver/database started to cut off some of the filters making it unreliable (erroring or not filtering properly). Also, there is a risk of hitting the Postgres query size limit (because every root has to be passed in SQL) which might lead to some limbo state which the plugin can't recover from. ## Tests ### Before - fetching all CommitReports always <img width="1293" alt="image" src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7"> ### After - fetching only unfinalized CommitReports and caching finalized ones <img width="1283" alt="image" src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970"> ### Summary Caching roots improved the performance, query is smaller in terms of the objects scanned and returned. Number of roots returned is constant almost during the entire test (around ~5 reports at once) Sightly higher memory footprint (around 800MB higher) but lower CPU utilization during tests (~15% drop) <img width="2500" alt="image" src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
## Motivation Current implementation fetches all the CommitReports within the permissionlessExecThreshold from database and filter out result set based on the `snoozedRoots`. It means that when everything is executed, we keep fetching all roots to memory only to discover that we don't need that data. During high traffic spikes, this is an unnecessary waste of resources in the CL node and database as the dataset size grows in time till it reaches permissionlessExec window (8 hours of logs for prod). Example load run in which you can see how it changes over time during steady traffic of messages <img width="1488" alt="image" src="https://github.com/smartcontractkit/ccip/assets/18304098/7b216dd4-1ab3-4943-9797-99a38f4f9eaa"> ## Solution There are at least two solutions to make it more performant 1. Pass executed roots to the database query to fetch only unexpired roots from the database. Unexpired roots can be determined from the snoozedRoots cache. This is probably the most accurate approach because it always fetches the minimal required set from the database. You can think of the query like this ```sql select * from evm.logs where address = :addr and event_sig = :event_sig and substring(data, X, Y) not in (:executed_roots_array) and block_timestamp > :permissionlessThresholdWindow ``` 2. Make permissionlessThreshold filter "sliding". Commit Reports are executed sequentially for most of the cases, so we can keep track of timestamps of the oldest fully executed report. Instead of inventing a new query for LogPoller we can achieve a lot based on the assumption that execution is sequential. Whenever report is markedAsExecuted we keep bumping block_timestamp filter. We already have that information in cache and indirectly rely on that when filtering out snoozedRoots ```sql and block_timestamp > :oldest_executed_timestamp ``` Solution 2 is easier to implement, doesn't require any changes to LogPoller (and query perf testing) and can be handled in the plugin. We already cache snoozedRoots so we can reuse that to move filtering to the database by narrowing the block_timestamp search window. Assuming that reports are executed sequentially, it should be as accurate as solution 1. ### Load testing solution 2 (the same configuration, but shorter test duration) <img width="1088" alt="image" src="https://github.com/smartcontractkit/ccip/assets/18304098/c445495d-51e7-4138-a00b-662423fa0cdf">
) ## Motivation Detailed context is here #726, but long story short scanning the entire `permissionLessExecThreshold` / `messageIntervalVisibility` from every Commit plugin instance (think of it as a lane) puts a lot of pressure to the database. For most of the cases, we don't need the entire window but only Roots that are not executed yet. Unfortunately, the solution suggested in the mentioned PR was faulty because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on Polygon exposed that problem, details and revert here #1155 ## Solution Divide the cache into two layers * finalized commit roots are kept in memory because we assume these never change * unfinalized commit roots are fetched from LogPoller, and finalized logs are promoted to the finalized part There should be no additional memory footprint because we kept all executed and finalized roots in cache memory anyway. The performance impact of this approach should be similar to the #726 , because we keep scanning only unfinalized chunks of the LogPoller so the number of logs fetched from the database would be very limited. However, there is one caveat here, LogPoller doesn't mark logs as finalized/not finalized by default so we need to ask for the latest finalized block in the Commit Reader directly and enrich logs with that information (similar approach to what we do for Offramp reader). Thankfully, asking the database for the latest log is extremely fast and (if needed) could be optimized by caching latest finalized block in the LogPoller/HeadTracker. There will be load tests conducted here to confirm performance assumptions. Please see the code and tests for more details chainlink-common PR smartcontractkit/chainlink-common#650 ### High level picture ![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a) ### Alternative solutions: [I've tried implementing the solution in which the cache is simplified and keeps only executed/snoozed roots](#1170). Snoozed Roots are passed to the database query and used for filtering out executed and finalized roots. However, it works only for some number of snoozed roots. After adding more than 250 snoozed roots to the LogPoller query, the driver/database started to cut off some of the filters making it unreliable (erroring or not filtering properly). Also, there is a risk of hitting the Postgres query size limit (because every root has to be passed in SQL) which might lead to some limbo state which the plugin can't recover from. ## Tests ### Before - fetching all CommitReports always <img width="1293" alt="image" src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7"> ### After - fetching only unfinalized CommitReports and caching finalized ones <img width="1283" alt="image" src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970"> ### Summary Caching roots improved the performance, query is smaller in terms of the objects scanned and returned. Number of roots returned is constant almost during the entire test (around ~5 reports at once) Sightly higher memory footprint (around 800MB higher) but lower CPU utilization during tests (~15% drop) <img width="2500" alt="image" src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
Motivation
Current implementation fetches all the CommitReports within the permissionlessExecThreshold from database and filter out result set based on the
snoozedRoots
. It means that when everything is executed, we keep fetching all roots to memory only to discover that we don't need that data. During high traffic spikes, this is an unnecessary waste of resources in the CL node and database as the dataset size grows in time till it reaches permissionlessExec window (8 hours of logs for prod).Example load run in which you can see how it changes over time during steady traffic of messages
Solution
There are at least two solutions to make it more performant
You can think of the query like this
Solution 2 is easier to implement, doesn't require any changes to LogPoller (and query perf testing) and can be handled in the plugin. We already cache snoozedRoots so we can reuse that to move filtering to the database by narrowing the block_timestamp search window. Assuming that reports are executed sequentially, it should be as accurate as solution 1.
Load testing solution 2 (the same configuration, but shorter test duration)