Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Returning messageVisibilityInterval always from commit roots cache #1155

Merged
merged 1 commit into from
Jul 8, 2024

Conversation

mateusz-sekara
Copy link
Contributor

@mateusz-sekara mateusz-sekara commented Jul 8, 2024

Motivation

Commit's Roots cache stores block_timestamps for the CommitReports, but when reorgs happens block_timestamp can also change because TX could be included in a different block.

Example scenario:

CommitRoot with 10:30:00 UTC block timestamp is inserted with AppendUnexecutedRoot (so it's block_timestamp is persisted in the cache). Then reorg happens on the destination and the same CommitRoot is inserted into the blockchain but let's say with 10:29:00 UTC (different block_time), which is never updated. Commit Roots cache returns persisted 10:30:00 as the oldest block timestamp and we keep searching LogPoller using the wrong lower bound filter - this Commit Root never pops up in the execution because it's never returned from DB

@mateusz-sekara mateusz-sekara merged commit 053e93a into ccip-develop Jul 8, 2024
106 checks passed
@mateusz-sekara mateusz-sekara deleted the permission-exec-threshold-always branch July 8, 2024 15:21
mateusz-sekara added a commit that referenced this pull request Jul 8, 2024
…1155)

Commit's Roots cache stores block_timestamps for the CommitReports, but
when reorgs happens block_timestamp can also change because TX could be
included in a different block.

Example scenario:

CommitRoot with 10:30:00 UTC block timestamp is inserted with
`AppendUnexecutedRoot` (so it's block_timestamp is persisted in the
cache). Then reorg happens on the destination and the same CommitRoot is
inserted into the blockchain but let's say with 10:29:00 UTC (different
block_time), which is never updated. Commit Roots cache returns
persisted 10:30:00 as the oldest block timestamp and we keep searching
LogPoller using the wrong lower bound filter - this Commit Root never
pops up in the execution because it's never returned from DB
jarnaud pushed a commit that referenced this pull request Jul 9, 2024
…1155)

## Motivation

Commit's Roots cache stores block_timestamps for the CommitReports, but
when reorgs happens block_timestamp can also change because TX could be
included in a different block.

Example scenario:

CommitRoot with 10:30:00 UTC block timestamp is inserted with
`AppendUnexecutedRoot` (so it's block_timestamp is persisted in the
cache). Then reorg happens on the destination and the same CommitRoot is
inserted into the blockchain but let's say with 10:29:00 UTC (different
block_time), which is never updated. Commit Roots cache returns
persisted 10:30:00 as the oldest block timestamp and we keep searching
LogPoller using the wrong lower bound filter - this Commit Root never
pops up in the execution because it's never returned from DB
mateusz-sekara added a commit that referenced this pull request Jul 31, 2024
)

## Motivation

Detailed context is here
#726, but long story short
scanning the entire `permissionLessExecThreshold` /
`messageIntervalVisibility` from every Commit plugin instance (think of
it as a lane) puts a lot of pressure to the database. For most of the
cases, we don't need the entire window but only Roots that are not
executed yet.

Unfortunately, the solution suggested in the mentioned PR was faulty
because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on
Polygon exposed that problem, details and revert here
#1155


## Solution

Divide the cache into two layers
* finalized commit roots are kept in memory because we assume these
never change
* unfinalized commit roots are fetched from LogPoller, and finalized
logs are promoted to the finalized part

There should be no additional memory footprint because we kept all
executed and finalized roots in cache memory anyway. The performance
impact of this approach should be similar to the #726 , because we keep
scanning only unfinalized chunks of the LogPoller so the number of logs
fetched from the database would be very limited.

However, there is one caveat here, LogPoller doesn't mark logs as
finalized/not finalized by default so we need to ask for the latest
finalized block in the Commit Reader directly and enrich logs with that
information (similar approach to what we do for Offramp reader).
Thankfully, asking the database for the latest log is extremely fast and
(if needed) could be optimized by caching latest finalized block in the
LogPoller/HeadTracker.

There will be load tests conducted here to confirm performance
assumptions.
Please see the code and tests for more details

chainlink-common PR
smartcontractkit/chainlink-common#650

### High level picture 


![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a)


### Alternative solutions:

[I've tried implementing the solution in which the cache is simplified
and keeps only executed/snoozed
roots](#1170). Snoozed
Roots are passed to the database query and used for filtering out
executed and finalized roots. However, it works only for some number of
snoozed roots. After adding more than 250 snoozed roots to the LogPoller
query, the driver/database started to cut off some of the filters making
it unreliable (erroring or not filtering properly). Also, there is a
risk of hitting the Postgres query size limit (because every root has to
be passed in SQL) which might lead to some limbo state which the plugin
can't recover from.


## Tests

### Before - fetching all CommitReports always

<img width="1293" alt="image"
src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7">


### After - fetching only unfinalized CommitReports and caching
finalized ones

<img width="1283" alt="image"
src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970">


### Summary

Caching roots improved the performance, query is smaller in terms of the
objects scanned and returned. Number of roots returned is constant
almost during the entire test (around ~5 reports at once)
Sightly higher memory footprint (around 800MB higher) but lower CPU
utilization during tests (~15% drop)

<img width="2500" alt="image"
src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
…1155)

## Motivation

Commit's Roots cache stores block_timestamps for the CommitReports, but
when reorgs happens block_timestamp can also change because TX could be
included in a different block.

Example scenario:

CommitRoot with 10:30:00 UTC block timestamp is inserted with
`AppendUnexecutedRoot` (so it's block_timestamp is persisted in the
cache). Then reorg happens on the destination and the same CommitRoot is
inserted into the blockchain but let's say with 10:29:00 UTC (different
block_time), which is never updated. Commit Roots cache returns
persisted 10:30:00 as the oldest block timestamp and we keep searching
LogPoller using the wrong lower bound filter - this Commit Root never
pops up in the execution because it's never returned from DB
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
)

## Motivation

Detailed context is here
#726, but long story short
scanning the entire `permissionLessExecThreshold` /
`messageIntervalVisibility` from every Commit plugin instance (think of
it as a lane) puts a lot of pressure to the database. For most of the
cases, we don't need the entire window but only Roots that are not
executed yet.

Unfortunately, the solution suggested in the mentioned PR was faulty
because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on
Polygon exposed that problem, details and revert here
#1155


## Solution

Divide the cache into two layers
* finalized commit roots are kept in memory because we assume these
never change
* unfinalized commit roots are fetched from LogPoller, and finalized
logs are promoted to the finalized part

There should be no additional memory footprint because we kept all
executed and finalized roots in cache memory anyway. The performance
impact of this approach should be similar to the #726 , because we keep
scanning only unfinalized chunks of the LogPoller so the number of logs
fetched from the database would be very limited.

However, there is one caveat here, LogPoller doesn't mark logs as
finalized/not finalized by default so we need to ask for the latest
finalized block in the Commit Reader directly and enrich logs with that
information (similar approach to what we do for Offramp reader).
Thankfully, asking the database for the latest log is extremely fast and
(if needed) could be optimized by caching latest finalized block in the
LogPoller/HeadTracker.

There will be load tests conducted here to confirm performance
assumptions.
Please see the code and tests for more details

chainlink-common PR
smartcontractkit/chainlink-common#650

### High level picture 


![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a)


### Alternative solutions:

[I've tried implementing the solution in which the cache is simplified
and keeps only executed/snoozed
roots](#1170). Snoozed
Roots are passed to the database query and used for filtering out
executed and finalized roots. However, it works only for some number of
snoozed roots. After adding more than 250 snoozed roots to the LogPoller
query, the driver/database started to cut off some of the filters making
it unreliable (erroring or not filtering properly). Also, there is a
risk of hitting the Postgres query size limit (because every root has to
be passed in SQL) which might lead to some limbo state which the plugin
can't recover from.


## Tests

### Before - fetching all CommitReports always

<img width="1293" alt="image"
src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7">


### After - fetching only unfinalized CommitReports and caching
finalized ones

<img width="1283" alt="image"
src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970">


### Summary

Caching roots improved the performance, query is smaller in terms of the
objects scanned and returned. Number of roots returned is constant
almost during the entire test (around ~5 reports at once)
Sightly higher memory footprint (around 800MB higher) but lower CPU
utilization during tests (~15% drop)

<img width="2500" alt="image"
src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
@0xnogo 0xnogo mentioned this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants