Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster listing of stake pools. #2111

Merged
merged 23 commits into from
Sep 10, 2020

Conversation

jonathanknowles
Copy link
Member

@jonathanknowles jonathanknowles commented Sep 4, 2020

Related Issues

#2082 (Listing stake pools time grows when there are wallets syncing in background.)

https://jira.iohk.io/browse/ADP-383 (Use more efficient DB queries when listing pools.)

Overview

This PR:

  • changes the implementation of ListStakePools to execute a small number of handwritten SQL queries with native joins, rather than issuing multiple queries and joining together the results in memory.

  • adds a new operation listPoolLifeCycleData. The SQLite implementation of this operation uses a single hand-optimized query to return lifecycle data for all pools.

Benefits

Faster execution time

The time taken to execute ListStakePools is greatly reduced, even when there are multiple wallets syncing in the background.

Click to view data
No. of
Wallets
Syncing
Time Required
(seconds)
master
Time Required
(seconds)
more-efficient-list-pools
0 8 3
5 30 7
10 60 8
15 72 9
20 95 10

Fewer database queries

The number of select queries required for a single call to ListStakePools is greatly reduced.

Click to view data
Branch Network No. of Pools No. of Queries
master mainnet 1,124 12,365
more-efficient-list-pools mainnet 1,124 4

Testing

✔️ Properties

This PR adds a property test for listPoolLifeCycleData, consisting of the following steps:

  1. Add an arbitrary sequence of pool registration and retirement certificates to the database, for multiple pools.
  2. Call listPoolLifeCycleData once to fetch lifecycle data for all active pools.
  3. Call readPoolLifeCycleStatus multiple times, once for each known pool, and coalesce the results.
  4. Test that the results returned in steps 2 and 3 are identical.

✔️ Live Data

Run the following command to get a list of all stake pools sorted by ID:

cardano-wallet stake-pool list --stake 1000 | jq 'sort_by(.id)' > stake-pool-list

The results should be identical for both master and this branch.

Notes

System used for Benchmarking

The system used for benchmarking (4 cores):

vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) CPU
stepping        : 7
microcode       : 0x1
cpu MHz         : 2800.176
cache size      : 33792 KB

@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from a4ba88c to 7106d54 Compare September 4, 2020 07:48
iohk-bors bot added a commit that referenced this pull request Sep 4, 2020
@jonathanknowles jonathanknowles self-assigned this Sep 4, 2020
@piotr-iohk
Copy link
Contributor

Awesome 🎉

My results of listing stake-pools with n wallets syncing on this branch (Intel i7 - 6 cores, 32GB RAM, SSD):

0 wallets    - 2-3s
10 wallets   - 4-6s
20 wallets   - 4-5s
50 wallets   - 4-5s
100 wallets  - 5-6s
200 wallets  - 7-8s

@iohk-bors

This comment has been minimized.

iohk-bors bot added a commit that referenced this pull request Sep 4, 2020
@iohk-bors

This comment has been minimized.

@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch 3 times, most recently from fac8cce to d00c608 Compare September 7, 2020 08:09
iohk-bors bot added a commit that referenced this pull request Sep 7, 2020
@iohk-bors

This comment has been minimized.

@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch 2 times, most recently from cea2818 to 83de0c4 Compare September 7, 2020 12:52
@jonathanknowles jonathanknowles changed the title WIP: Use more efficient database queries in ListStakePools. WIP: Faster listing of stake pools. Sep 7, 2020
@jonathanknowles jonathanknowles added the IMPROVEMENT Mark a PR as an improvement, for auto-generated CHANGELOG label Sep 7, 2020
@cardano-foundation cardano-foundation deleted a comment from iohk-bors bot Sep 7, 2020
@cardano-foundation cardano-foundation deleted a comment from iohk-bors bot Sep 7, 2020
lib/core/src/Cardano/Pool/DB/Sqlite.hs Outdated Show resolved Hide resolved
lib/core/src/Cardano/Pool/DB/Sqlite.hs Outdated Show resolved Hide resolved
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from 86ee15a to 6fbc807 Compare September 8, 2020 02:59
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
For rows that fail to parse: we create a log entry for each error.
For rows that parse successfully: we return these as part of the result.

In response to review feedback:

#2111 (comment)
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
For rows that fail to parse: we create a log entry for each error.
For rows that parse successfully: we return these as part of the result.

In response to review feedback:

#2111 (comment)
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from 832ef1b to 3155742 Compare September 8, 2020 03:33
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
For rows that fail to parse: we create a log entry for each error.
For rows that parse successfully: we return these as part of the result.

In response to review feedback:

#2111 (comment)
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from 3155742 to 79f505c Compare September 8, 2020 03:34
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
jonathanknowles added a commit that referenced this pull request Sep 8, 2020
For rows that fail to parse: we create a log entry for each error.
For rows that parse successfully: we return these as part of the result.

In response to review feedback:

#2111 (comment)
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from 79f505c to 68aaf9e Compare September 8, 2020 05:29
This replaces a sequence of n database operations with a single database
operation that performs just a single query.
This makes it possible to filter out retired pools at the SQL query level.
This is no longer necessary, as retired pools are now filtered out at
the SQL query level.
This type will be used by later commits, to generate sequences of
registration and retirement certificates affecting multiple pools.
Use `MultiPoolCertificateSequence` to simplify the
`prop_listRetiredPools_multiplePools_multipleCerts` property.
This makes it possible to provide the following function, which produces
a single, flattened list of pool certificates affecting multiple pools.

```
getMultiPoolCertificateSequence
    :: MultiPoolCertificateSequence -> [PoolCertificate]
```
Every valid registration certificate must have at least one owner.

Ideally, this should be encoded in the type system, but we make do with
not generating invalid data for now.
This more accurately reflects the function's purpose.

We don't actually mind if the list of certificates is empty.
Log parse failures in the following pool DB operations:
  - `readTotalProduction`.
  - `listPoolLifeCycleData`.
  - `listRetiredPools`.

In response to review feedback:

#2111 (comment)
There's no point in fetching the entire set of all pool-related data
when all we really need is the set of valid pool IDs.
This makes it possible to change the definitions of views as required.
@jonathanknowles jonathanknowles force-pushed the jonathanknowles/more-efficient-list-pools branch from b504448 to 7aabf90 Compare September 10, 2020 02:29
@jonathanknowles
Copy link
Member Author

bors r+

iohk-bors bot added a commit that referenced this pull request Sep 10, 2020
2111: Faster listing of stake pools. r=jonathanknowles a=jonathanknowles

## Issue Number

#2082 (_Listing stake pools time grows when there are wallets syncing in background_.)

## Overview

This PR:

* changes the implementation of `ListStakePools` to execute a small number of handwritten SQL queries with native joins, rather than issuing multiple queries and joining together the results in memory.

* adds a new operation [`listPoolLifeCycleData`](https://github.com/input-output-hk/cardano-wallet/blob/d00c608e47d2d189a282a02946cc8e46b843375b/lib/core/src/Cardano/Pool/DB.hs#L180). The [SQLite implementation](https://github.com/input-output-hk/cardano-wallet/blob/d00c608e47d2d189a282a02946cc8e46b843375b/lib/core/src/Cardano/Pool/DB/Sqlite.hs#L384) of this operation uses a [single hand-optimized query](https://github.com/input-output-hk/cardano-wallet/blob/d00c608e47d2d189a282a02946cc8e46b843375b/lib/core/src/Cardano/Pool/DB/Sqlite.hs#L595) to return lifecycle data for all pools.

## Benefits

### ⭐ **Faster execution time**

  The time taken to execute `ListStakePools` is greatly reduced, even when there are multiple wallets syncing in the background.
  <details><summary>Click to view data</summary><br>

  | No. of <br>Wallets<br>Syncing | Time Required<br>(seconds)<br>`master` | Time Required<br>(seconds)<br>`more-efficient-list-pools` |
  | ---: | ---: | ---: |
  | 0 | 8 | 3 |
  | 5 | 30 | 7 |
  | 10 | 60 | 8 |
  | 15 | 72 | 9 |
  | 20 | 95 | 10 |
  </details>

### ⭐ **Fewer database queries**

  The number of `select` queries required for a single call to `ListStakePools` is greatly reduced.
  <details><summary>Click to view data</summary><br>
  
  | Branch | Network | No. of Pools | No. of Queries |
  | :--- | :--- | ---: | ---: | 
  | `master` | `mainnet` |  1,124 |  12,365 |
  | `more-efficient-list-pools`| `mainnet` | 1,124  | 4 |
  </details>

## Testing

### ✔️ Properties

This PR adds a [property test](https://github.com/input-output-hk/cardano-wallet/blob/d00c608e47d2d189a282a02946cc8e46b843375b/lib/core/test/unit/Cardano/Pool/DB/Properties.hs#L987) for `listPoolLifeCycleData`, consisting of the following steps:
1. Add an arbitrary sequence of pool registration and retirement certificates to the database, for multiple pools.
2. Call `listPoolLifeCycleData` once to fetch lifecycle data for all active pools.
3. Call `readPoolLifeCycleStatus` multiple times, once for each known pool, and coalesce the results. 
4. Test that the results returned in steps 2 and 3 are identical.

### ✔️ Live Data

Run the following command to get a list of all stake pools sorted by ID:
```sh
cardano-wallet stake-pool list --stake 1000 | jq 'sort_by(.id)' > stake-pool-list
```
The results should be identical for both `master` and this branch.

## Notes

### System used for Benchmarking

The system used for benchmarking (4 cores):

```
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) CPU
stepping        : 7
microcode       : 0x1
cpu MHz         : 2800.176
cache size      : 33792 KB
```

Co-authored-by: Jonathan Knowles <jonathan.knowles@iohk.io>
@iohk-bors
Copy link
Contributor

iohk-bors bot commented Sep 10, 2020

Build Failure

Suspected Cause

Parse failure triggered by wallet DB state machine tests.

Log Extract

  2) Cardano.Wallet.DB.Sqlite, Sqlite State machine (RndState), Sequential state machine tests
       uncaught exception: PersistException
       PersistMarshalError "Couldn't parse field `txMetaData` from table `tx_meta`. ConversionErrExpected \"[Aeson.String x]\" \"fromList [(\\\"hex\\\",Array [Number 0.0])]\""
       (after 1491 tests and 70 shrinks)
   ...
  To rerun use: --match "/Cardano.Wallet.DB.Sqlite/Sqlite State machine (RndState)/Sequential state machine tests/"

Randomized with seed 485074820

Related Issues

#2130
IntersectMBO/cardano-node#1797

@jonathanknowles
Copy link
Member Author

bors r+

@iohk-bors
Copy link
Contributor

iohk-bors bot commented Sep 10, 2020

Build succeeded:

@iohk-bors iohk-bors bot merged commit 0767271 into master Sep 10, 2020
@iohk-bors iohk-bors bot deleted the jonathanknowles/more-efficient-list-pools branch September 10, 2020 05:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
IMPROVEMENT Mark a PR as an improvement, for auto-generated CHANGELOG
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants