Add LedgerEntryChangeCache #2004

bartekn · 2019-12-04T16:13:30Z

PR Checklist

PR Structure

This PR has reasonably narrow scope (if not, break it down into smaller PRs).
This PR avoids mixing refactoring changes with feature changes (split into two PRs
otherwise).
This PR's title starts with name of package that is most changed in the PR, ex.
services/friendbot, or all or doc if the changes are broad or impact many
packages.

Thoroughness

This PR adds tests for the most critical parts of the new functionality or fixes.
I've updated any docs (developer docs, .md
files, etc... affected by this change). Take a look in the docs folder for a given service,
like this one.

Release planning

I've updated the relevant CHANGELOG (here for Horizon) if
needed with deprecations, added features, breaking changes, and DB schema changes.
I've decided if this PR requires a new major/minor version according to
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.

What

Adds exp/ingest/io.LedgerEntryChangeCache that squashes all the ledger entry changes. This can be later used to decrease number of DB queries when applying them. See #2003.

Close #2003.

Why

Some ledgers that add a lot of changes connected to a small set of entries are causing a performance issues because every ledger entry change is applied to a DB. LedgerEntryChangeCache solves this problem because it makes holds a final version of a ledger entry after all the changes.

Before this fix, extreme cases when two accounts send a payment between each other 1000 times in a ledger required 3000 DB updates (2000 account changes due to payment and 500 fee meta per account). After the fix, it requires just 2 DB updates.

Algorithm used in LedgerEntryChangeCache is explained in the comment:

// LedgerEntryChangeCache is a cache of ledger entry changes that squashes all
// changes within a single ledger. By doing this, it decreases number of DB
// queries sent to a DB to update the current state of the ledger.
// It has integrity checks built in so ex. removing an account that was
// previously removed returns an error. In such case verify.StateError is
// returned.
//
// It applies changes to the cache using the following algorithm:
//
// 1. If the change is CREATED it checks if any change connected to given entry
//    is already in the cache. If not, it adds CREATED change. Otherwise, if
//    existing change is:
//    a. CREATED it returns error because we can't add an entry that already
//       exists.
//    b. UPDATED it returns error because we can't add an entry that already
//       exists.
//    c. REMOVED it means that due to previous transitions we want to remove
//       this from a DB what means that it already exists in a DB so we need to
//       update the type of change to UPDATED.
// 2. If the change is UPDATE it checks if any change connected to given entry
//    is already in the cache. If not, it adds UPDATE change. Otherwise, if
//    existing change is:
//    a. CREATED it means that due to previous transitions we want to create
//       this in a DB what means that it doesn't exist in a DB so we need to
//       update the entry but stay with CREATED type.
//    b. UPDATED we simply update it with the new value.
//    c. REMOVED it means that at this point in the ledger the entry is removed
//       so updating it returns an error.
// 3. If the change is REMOVE it checks if any change connected to given entry
//    is already in the cache. If not, it adds REMOVE change. Otherwise, if
//    existing change is:
//    a. CREATED it means that due to previous transitions we want to create
//       this in a DB what means that it doesn't exist in a DB. If it was
//       created and removed in the same ledger it's a noop so we remove entry
//       from the cache.
//    b. UPDATED we simply update it to be a REMOVE change because the UPDATE
//       change means the entry exists in a DB.
//    c. REMOVED it returns error because we can't remove an entry that was
//       already removed.

bartekn · 2019-12-04T16:36:30Z

services/horizon/internal/expingest/ledger_entry_change_cache.go

+	case xdr.LedgerEntryTypeData:
+		rowsAffected, err = c.HistoryQ.RemoveAccountData(entry.LedgerKey().MustData())
+	case xdr.LedgerEntryTypeOffer:
+		rowsAffected, err = c.HistoryQ.RemoveOffer(entry.LedgerKey().MustOffer().OfferId)


Actually, we could build BatchRemoveBuilder that would remove rows in batches (DELETE ... WHERE ... IN (...)). This could be helpful when removing many offers after a trade.

services/horizon/internal/expingest/ledger_entry_change_cache.go

tamirms · 2019-12-09T16:36:26Z

approach looks good

abuiles

👍

ire-and-curses

Going in a good direction, I had a few comments/questions mostly for my understanding.

services/horizon/internal/expingest/ledger_entry_change_cache.go

xdr/ledger_key.go

services/horizon/internal/expingest/ledger_entry_change_cache.go

ire-and-curses · 2019-12-09T17:51:15Z

services/horizon/internal/expingest/ledger_entry_change_cache.go

+			// If existing type is removed it means that this entry does exist
+			// in a DB so we update entry change.
+			c.cache[ledgerKeyString] = xdr.LedgerEntryChange{
+				Type:    xdr.LedgerEntryChangeTypeLedgerEntryUpdated,


Should this be LedgerEntryChangeTypeLedgerEntryCreated since it is being added for the first time?

If so, also change comment above to in the DB so we add entry change

The case here is:

Entry exists in the previous ledger.

It was removed in the current ledger (thus removed in the cache).

It is now created so given it exists in the previous ledger we need to update it instead of create it.

Got it. I think the confusing part about this code is that the previous ledger has a state, and so does the cache (representing current ledger), and they can of course be different. Is there a way to make it clearer which state we're dealing with?

I added some updates to the code that should make it more clear.

ire-and-curses · 2019-12-09T17:55:43Z

services/horizon/internal/expingest/ledger_entry_change_cache.go

+// queries sent to a DB to update the current state of the ledger.
+// It has integrity checks built in so ex. removing an account that was
+// previously removed returns an error. In such case verify.StateError is
+// returned.


I wonder if we could add a small section explaining the possible state transitions here. Something like:

Existing Type Add Update Remove EntryCreated X entry exists, can update entry exists, can remove EntryUpdated X entry exists, can update entry exists, can remove EntryRemoved no entry, can add X X

The rationale is to try and help with understanding when we're talking about previous vs current ledger (in the form of the cache). Also because the logic is currently distributed through several functions, and it depends on the XDR state transitions of stellar core, so the truth table is hard to verify.

Expanded the comment to explain exactly how the algorithm looks like.

ire-and-curses · 2019-12-09T17:57:02Z

services/horizon/internal/expingest/ledger_entry_change_cache.go

+				ledgerKeyString,
+			))
+		case xdr.LedgerEntryChangeTypeLedgerEntryRemoved:
+			// If existing type is removed it means that this entry does exist


Suggested change

// If existing type is removed it means that this entry does exist

// If existing type is removed it means that this entry does not exist

The current comment is correct. We want to add create change to the cache and existing change in the cache is removed. This means that the entry exists in the DB/previous ledger (probably received removed before this change).

Hmm. I'm sure you're right but this is very confusing. :/ Is there a way to make it clearer?

Check the latest code and let me know if it's still confusing.

Co-Authored-By: Eric Saunders <ire-and-curses@users.noreply.github.com>

exp/ingest/io/ledger_entry_change_cache.go

bartekn · 2019-12-12T14:09:36Z

I checked performance of this solution by replaying ingestion of 26988927-26989439 ledgers range in pubnet (when DRA spam happened).

Before we explore the data it's important to understand that the test was performed on my dev machine so both Horizon and it's DB were on the same server. This means that connection round trip time was minimal. The actual savings this PR introduces are in the round trip times because it makes much smaller number of DB queries.

Duration difference:

Full data can be found in this spreadsheet.

Observations:

The ingestion of ledgers with a small number of ops is slightly faster. It's around 0.06 s on average but around 33% faster comparing them to previous code. You can see them in the left side of chart.
For ledgers with many ops, the results are much better. Looking at the chart, ingestion was faster by between 0.5-2.2 seconds. Ingesting ledgers with diff >= 0.5 was 45% faster on average.
For ledgers that contain mostly DRA spam (ex. 26989221) ingestion can be even 80% faster: pre: 1.382870728 post: 0.269690039 diff: 1.113180689.
Ingestion of 16 ledgers (of 512 total = 3%) was actually slower by 0.06 sec. on average (0.28 sec. max). I checked these ledgers and I think that after adding the cache the number of DB queries don't drop significantly. The average increase in ingestion time is negligible in this case in my opinion.

I believe results will be much better in staging where the DB is on a different server.

ire-and-curses · 2019-12-12T16:18:32Z

This looks very promising, and reinforces the rationale for doing this. I think we should merge. After that, It would be great to get an unstable version running on the staging server to see real-world numbers with the remote DB server.

ire-and-curses

Looks great! Two optional suggestions.

exp/ingest/io/ledger_entry_change_cache.go

services/horizon/internal/expingest/processors/database_processor.go

exp/ingest/io/ledger_entry_change_cache.go

This commit changes `DatabaseProcessor` to insert/update (upsert) accounts and trust lines in batches. In #2004 we added `LedgerEntryChangeCache` that aims to decrease number of DB updates: all the changes connected to a single ledger entry are squashed into just one DB query. Even though it gives a nice performance boost when large number of ops change a small number of ledger entries, it turns out this is not enough. When many ledger entries are changed in a new ledger, DB connection round trip time takes significant percentage of time in overall ledger processing time. The SQL query was borrowed [1] from stellar-core. Known limitations: * This code shouldn't be the final code that is released. The aim of this PR is to deploy it to stg environment to check it's performance. * This is adding upsert queries for accounts and trust lines only. These types are the most common changes in the recent ledgers. The final solution should add batch upsert and batch delete for all ledger entries. * There's a potential to improve the code: autogenerating upsert query for any ledger entry type, better tests. Will be done in a separate PR. * It removes a code that check a number of rows affected by a query. Unfortunately, the performance of the current solution is too bad to keep it. [1] https://github.com/stellar/stellar-core/blob/21469f90da1eacc6845017a520e179afb3772e65/src/ledger/LedgerTxnAccountSQL.cpp#L306-L336

Add LedgerEntryChangeCache

be2ee71

cla-bot bot added the cla: yes label Dec 4, 2019

bartekn commented Dec 4, 2019

View reviewed changes

tamirms reviewed Dec 9, 2019

View reviewed changes

services/horizon/internal/expingest/ledger_entry_change_cache.go Outdated Show resolved Hide resolved

tamirms reviewed Dec 9, 2019

View reviewed changes

services/horizon/internal/expingest/ledger_entry_change_cache.go Outdated Show resolved Hide resolved

Review fixes

997efa6

bartekn mentioned this pull request Dec 9, 2019

services/horizon/expingest: Split DatabaseProcessor into separate processors #2025

Closed

abuiles approved these changes Dec 9, 2019

View reviewed changes

ire-and-curses requested changes Dec 9, 2019

View reviewed changes

Apply suggestions from code review

10a87dd

Co-Authored-By: Eric Saunders <ire-and-curses@users.noreply.github.com>

bartekn changed the base branch from release-horizon-v0.24.0 to release-horizon-v0.25.0 December 10, 2019 13:23

bartekn added 4 commits December 10, 2019 15:55

Move LedgerEntryChangeCache to exp/ingest

2938096

Add errors

38eadb9

Use io.Change

1c74ab6

Fix tests

75649b2

bartekn marked this pull request as ready for review December 10, 2019 16:43

bartekn changed the base branch from release-horizon-v0.25.0 to release-horizon-v0.24.1 December 10, 2019 16:50

tamirms reviewed Dec 10, 2019

View reviewed changes

exp/ingest/io/ledger_entry_change_cache.go Outdated Show resolved Hide resolved

bartekn requested a review from ire-and-curses December 12, 2019 14:20

Update comment

977c28a

tamirms approved these changes Dec 12, 2019

View reviewed changes

ire-and-curses approved these changes Dec 12, 2019

View reviewed changes

exp/ingest/io/ledger_entry_change_cache.go Outdated Show resolved Hide resolved

services/horizon/internal/expingest/processors/database_processor.go Outdated Show resolved Hide resolved

exp/ingest/io/ledger_entry_change_cache.go Show resolved Hide resolved

bartekn added 2 commits December 12, 2019 17:52

Review fixes

af5bbec

Fix tests

a4d56b2

bartekn merged commit 6398a58 into stellar:release-horizon-v0.24.1 Dec 12, 2019

bartekn deleted the ledger-entry-change-cache branch December 12, 2019 17:06

bartekn mentioned this pull request Dec 12, 2019

services/horizon/expingest: Create ledger entry changes cache in front of DB #2003

Closed

bartekn mentioned this pull request Dec 18, 2019

services/horizon: Batch Upsert Accounts and Trust Lines #2073

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LedgerEntryChangeCache #2004

Add LedgerEntryChangeCache #2004

bartekn commented Dec 4, 2019 •

edited

Loading

bartekn Dec 4, 2019 •

edited

Loading

tamirms commented Dec 9, 2019

abuiles left a comment

ire-and-curses left a comment

ire-and-curses Dec 9, 2019

ire-and-curses Dec 9, 2019

bartekn Dec 10, 2019

ire-and-curses Dec 10, 2019

bartekn Dec 10, 2019

ire-and-curses Dec 9, 2019

ire-and-curses Dec 10, 2019

bartekn Dec 10, 2019

ire-and-curses Dec 9, 2019

bartekn Dec 10, 2019

ire-and-curses Dec 10, 2019

bartekn Dec 10, 2019

bartekn commented Dec 12, 2019 •

edited

Loading

ire-and-curses commented Dec 12, 2019

ire-and-curses left a comment

	// If existing type is removed it means that this entry does exist
	// If existing type is removed it means that this entry does not exist

Add LedgerEntryChangeCache #2004

Add LedgerEntryChangeCache #2004

Conversation

bartekn commented Dec 4, 2019 • edited Loading

PR Structure

Thoroughness

Release planning

What

Why

bartekn Dec 4, 2019 • edited Loading

Choose a reason for hiding this comment

tamirms commented Dec 9, 2019

abuiles left a comment

Choose a reason for hiding this comment

ire-and-curses left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bartekn commented Dec 12, 2019 • edited Loading

ire-and-curses commented Dec 12, 2019

ire-and-curses left a comment

Choose a reason for hiding this comment

bartekn commented Dec 4, 2019 •

edited

Loading

bartekn Dec 4, 2019 •

edited

Loading

bartekn commented Dec 12, 2019 •

edited

Loading