Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CCIP-2003 Improved fetching Commit Reports from database #726

Merged
merged 9 commits into from
Apr 23, 2024

Conversation

mateusz-sekara
Copy link
Contributor

@mateusz-sekara mateusz-sekara commented Apr 17, 2024

Motivation

Current implementation fetches all the CommitReports within the permissionlessExecThreshold from database and filter out result set based on the snoozedRoots. It means that when everything is executed, we keep fetching all roots to memory only to discover that we don't need that data. During high traffic spikes, this is an unnecessary waste of resources in the CL node and database as the dataset size grows in time till it reaches permissionlessExec window (8 hours of logs for prod).

Example load run in which you can see how it changes over time during steady traffic of messages

image

Solution

There are at least two solutions to make it more performant

  1. Pass executed roots to the database query to fetch only unexpired roots from the database. Unexpired roots can be determined from the snoozedRoots cache. This is probably the most accurate approach because it always fetches the minimal required set from the database.

You can think of the query like this

select * from evm.logs
where address = :addr
and event_sig = :event_sig
and substring(data, X, Y) not in (:executed_roots_array)
and block_timestamp > :permissionlessThresholdWindow
  1. Make permissionlessThreshold filter "sliding". Commit Reports are executed sequentially for most of the cases, so we can keep track of timestamps of the oldest fully executed report. Instead of inventing a new query for LogPoller we can achieve a lot based on the assumption that execution is sequential. Whenever report is markedAsExecuted we keep bumping block_timestamp filter. We already have that information in cache and indirectly rely on that when filtering out snoozedRoots
and block_timestamp > :oldest_executed_timestamp 

Solution 2 is easier to implement, doesn't require any changes to LogPoller (and query perf testing) and can be handled in the plugin. We already cache snoozedRoots so we can reuse that to move filtering to the database by narrowing the block_timestamp search window. Assuming that reports are executed sequentially, it should be as accurate as solution 1.

Load testing solution 2 (the same configuration, but shorter test duration)

image

Copy link
Contributor

I see you updated files related to core. Please run pnpm changeset in the root directory to add a changeset.

@mateusz-sekara mateusz-sekara marked this pull request as ready for review April 17, 2024 12:29
@mateusz-sekara mateusz-sekara requested a review from a team as a code owner April 17, 2024 12:29
// Now the search filter wil be 0xA timestamp -> [2010-10-11, 20-10-15]
// If roots are executed out of order, it's not going to change anything. However, for most of the cases we have sequential root execution and that is
// a huge improvement because we don't need to fetch all the roots from the database in every round.
unexecutedRootsQueue *orderedmap.OrderedMap[string, time.Time]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer not to use this lib.
We could have our own simple implementation or use a different type.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have our own simple implementation you can also have the rootsQueueMu defined there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was already in the deps so I've decided to reuse it. Do you have any better ideas? I'm not sure about reimplementing, sounds like reinventing the wheel.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative implementation would be this https://github.com/elliotchance/orderedmap (more GH stars)

@mateusz-sekara mateusz-sekara requested a review from dimkouv April 23, 2024 06:13
@mateusz-sekara mateusz-sekara merged commit ab356a4 into ccip-develop Apr 23, 2024
85 checks passed
@mateusz-sekara mateusz-sekara deleted the better-fetching-commitroots branch April 23, 2024 08:01
mateusz-sekara added a commit that referenced this pull request Apr 23, 2024
Current implementation fetches all the CommitReports within the
permissionlessExecThreshold from database and filter out result set
based on the `snoozedRoots`. It means that when everything is executed,
we keep fetching all roots to memory only to discover that we don't need
that data. During high traffic spikes, this is an unnecessary waste of
resources in the CL node and database as the dataset size grows in time
till it reaches permissionlessExec window (8 hours of logs for prod).

Example load run in which you can see how it changes over time during
steady traffic of messages

<img width="1488" alt="image"
src="https://github.com/smartcontractkit/ccip/assets/18304098/7b216dd4-1ab3-4943-9797-99a38f4f9eaa">

There are at least two solutions to make it more performant

1. Pass executed roots to the database query to fetch only unexpired
roots from the database. Unexpired roots can be determined from the
snoozedRoots cache. This is probably the most accurate approach because
it always fetches the minimal required set from the database.

You can think of the query like this
```sql
select * from evm.logs
where address = :addr
and event_sig = :event_sig
and substring(data, X, Y) not in (:executed_roots_array)
and block_timestamp > :permissionlessThresholdWindow
```

2. Make permissionlessThreshold filter "sliding". Commit Reports are
executed sequentially for most of the cases, so we can keep track of
timestamps of the oldest fully executed report. Instead of inventing a
new query for LogPoller we can achieve a lot based on the assumption
that execution is sequential. Whenever report is markedAsExecuted we
keep bumping block_timestamp filter. We already have that information in
cache and indirectly rely on that when filtering out snoozedRoots
```sql
and block_timestamp > :oldest_executed_timestamp
```

Solution 2 is easier to implement, doesn't require any changes to
LogPoller (and query perf testing) and can be handled in the plugin. We
already cache snoozedRoots so we can reuse that to move filtering to the
database by narrowing the block_timestamp search window. Assuming that
reports are executed sequentially, it should be as accurate as solution
1.

duration)
<img width="1088" alt="image"
src="https://github.com/smartcontractkit/ccip/assets/18304098/c445495d-51e7-4138-a00b-662423fa0cdf">
RensR added a commit that referenced this pull request Apr 23, 2024
* Adding evm_chain_id, address and event_sig to data and topic indexes
smartcontractkit/chainlink#12786
* Improved fetching Commit Reports from database
#726
@mateusz-sekara mateusz-sekara changed the title Improved fetching Commit Reports from database CCIP-2003 Improved fetching Commit Reports from database May 15, 2024
mateusz-sekara added a commit that referenced this pull request Jul 31, 2024
)

## Motivation

Detailed context is here
#726, but long story short
scanning the entire `permissionLessExecThreshold` /
`messageIntervalVisibility` from every Commit plugin instance (think of
it as a lane) puts a lot of pressure to the database. For most of the
cases, we don't need the entire window but only Roots that are not
executed yet.

Unfortunately, the solution suggested in the mentioned PR was faulty
because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on
Polygon exposed that problem, details and revert here
#1155


## Solution

Divide the cache into two layers
* finalized commit roots are kept in memory because we assume these
never change
* unfinalized commit roots are fetched from LogPoller, and finalized
logs are promoted to the finalized part

There should be no additional memory footprint because we kept all
executed and finalized roots in cache memory anyway. The performance
impact of this approach should be similar to the #726 , because we keep
scanning only unfinalized chunks of the LogPoller so the number of logs
fetched from the database would be very limited.

However, there is one caveat here, LogPoller doesn't mark logs as
finalized/not finalized by default so we need to ask for the latest
finalized block in the Commit Reader directly and enrich logs with that
information (similar approach to what we do for Offramp reader).
Thankfully, asking the database for the latest log is extremely fast and
(if needed) could be optimized by caching latest finalized block in the
LogPoller/HeadTracker.

There will be load tests conducted here to confirm performance
assumptions.
Please see the code and tests for more details

chainlink-common PR
smartcontractkit/chainlink-common#650

### High level picture 


![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a)


### Alternative solutions:

[I've tried implementing the solution in which the cache is simplified
and keeps only executed/snoozed
roots](#1170). Snoozed
Roots are passed to the database query and used for filtering out
executed and finalized roots. However, it works only for some number of
snoozed roots. After adding more than 250 snoozed roots to the LogPoller
query, the driver/database started to cut off some of the filters making
it unreliable (erroring or not filtering properly). Also, there is a
risk of hitting the Postgres query size limit (because every root has to
be passed in SQL) which might lead to some limbo state which the plugin
can't recover from.


## Tests

### Before - fetching all CommitReports always

<img width="1293" alt="image"
src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7">


### After - fetching only unfinalized CommitReports and caching
finalized ones

<img width="1283" alt="image"
src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970">


### Summary

Caching roots improved the performance, query is smaller in terms of the
objects scanned and returned. Number of roots returned is constant
almost during the entire test (around ~5 reports at once)
Sightly higher memory footprint (around 800MB higher) but lower CPU
utilization during tests (~15% drop)

<img width="2500" alt="image"
src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
## Motivation

Current implementation fetches all the CommitReports within the
permissionlessExecThreshold from database and filter out result set
based on the `snoozedRoots`. It means that when everything is executed,
we keep fetching all roots to memory only to discover that we don't need
that data. During high traffic spikes, this is an unnecessary waste of
resources in the CL node and database as the dataset size grows in time
till it reaches permissionlessExec window (8 hours of logs for prod).

Example load run in which you can see how it changes over time during
steady traffic of messages

<img width="1488" alt="image"
src="https://github.com/smartcontractkit/ccip/assets/18304098/7b216dd4-1ab3-4943-9797-99a38f4f9eaa">

## Solution

There are at least two solutions to make it more performant

1. Pass executed roots to the database query to fetch only unexpired
roots from the database. Unexpired roots can be determined from the
snoozedRoots cache. This is probably the most accurate approach because
it always fetches the minimal required set from the database.

You can think of the query like this
```sql
select * from evm.logs
where address = :addr
and event_sig = :event_sig
and substring(data, X, Y) not in (:executed_roots_array)
and block_timestamp > :permissionlessThresholdWindow
```


2. Make permissionlessThreshold filter "sliding". Commit Reports are
executed sequentially for most of the cases, so we can keep track of
timestamps of the oldest fully executed report. Instead of inventing a
new query for LogPoller we can achieve a lot based on the assumption
that execution is sequential. Whenever report is markedAsExecuted we
keep bumping block_timestamp filter. We already have that information in
cache and indirectly rely on that when filtering out snoozedRoots
```sql
and block_timestamp > :oldest_executed_timestamp 
```


Solution 2 is easier to implement, doesn't require any changes to
LogPoller (and query perf testing) and can be handled in the plugin. We
already cache snoozedRoots so we can reuse that to move filtering to the
database by narrowing the block_timestamp search window. Assuming that
reports are executed sequentially, it should be as accurate as solution
1.

### Load testing solution 2 (the same configuration, but shorter test
duration)
<img width="1088" alt="image"
src="https://github.com/smartcontractkit/ccip/assets/18304098/c445495d-51e7-4138-a00b-662423fa0cdf">
asoliman92 pushed a commit that referenced this pull request Jul 31, 2024
)

## Motivation

Detailed context is here
#726, but long story short
scanning the entire `permissionLessExecThreshold` /
`messageIntervalVisibility` from every Commit plugin instance (think of
it as a lane) puts a lot of pressure to the database. For most of the
cases, we don't need the entire window but only Roots that are not
executed yet.

Unfortunately, the solution suggested in the mentioned PR was faulty
because it relied on `block_timestamp` of unfinalized blocks. Re-orgs on
Polygon exposed that problem, details and revert here
#1155


## Solution

Divide the cache into two layers
* finalized commit roots are kept in memory because we assume these
never change
* unfinalized commit roots are fetched from LogPoller, and finalized
logs are promoted to the finalized part

There should be no additional memory footprint because we kept all
executed and finalized roots in cache memory anyway. The performance
impact of this approach should be similar to the #726 , because we keep
scanning only unfinalized chunks of the LogPoller so the number of logs
fetched from the database would be very limited.

However, there is one caveat here, LogPoller doesn't mark logs as
finalized/not finalized by default so we need to ask for the latest
finalized block in the Commit Reader directly and enrich logs with that
information (similar approach to what we do for Offramp reader).
Thankfully, asking the database for the latest log is extremely fast and
(if needed) could be optimized by caching latest finalized block in the
LogPoller/HeadTracker.

There will be load tests conducted here to confirm performance
assumptions.
Please see the code and tests for more details

chainlink-common PR
smartcontractkit/chainlink-common#650

### High level picture 


![image](https://github.com/smartcontractkit/ccip/assets/18304098/5c7e04e7-9703-4228-ac10-d264ab6e184a)


### Alternative solutions:

[I've tried implementing the solution in which the cache is simplified
and keeps only executed/snoozed
roots](#1170). Snoozed
Roots are passed to the database query and used for filtering out
executed and finalized roots. However, it works only for some number of
snoozed roots. After adding more than 250 snoozed roots to the LogPoller
query, the driver/database started to cut off some of the filters making
it unreliable (erroring or not filtering properly). Also, there is a
risk of hitting the Postgres query size limit (because every root has to
be passed in SQL) which might lead to some limbo state which the plugin
can't recover from.


## Tests

### Before - fetching all CommitReports always

<img width="1293" alt="image"
src="https://github.com/user-attachments/assets/ea32eccb-8d76-468c-9018-01d4989968f7">


### After - fetching only unfinalized CommitReports and caching
finalized ones

<img width="1283" alt="image"
src="https://github.com/user-attachments/assets/3732f438-063e-4e8f-86a8-849684e88970">


### Summary

Caching roots improved the performance, query is smaller in terms of the
objects scanned and returned. Number of roots returned is constant
almost during the entire test (around ~5 reports at once)
Sightly higher memory footprint (around 800MB higher) but lower CPU
utilization during tests (~15% drop)

<img width="2500" alt="image"
src="https://github.com/user-attachments/assets/027cb379-03ec-419c-a289-7b1df3e25cd8">
@0xnogo 0xnogo mentioned this pull request Jul 31, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants