Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LedgerEntryChangeCache #2004

Merged

Conversation

bartekn
Copy link
Contributor

@bartekn bartekn commented Dec 4, 2019

PR Checklist

PR Structure

  • This PR has reasonably narrow scope (if not, break it down into smaller PRs).
  • This PR avoids mixing refactoring changes with feature changes (split into two PRs
    otherwise).
  • This PR's title starts with name of package that is most changed in the PR, ex.
    services/friendbot, or all or doc if the changes are broad or impact many
    packages.

Thoroughness

  • This PR adds tests for the most critical parts of the new functionality or fixes.
  • I've updated any docs (developer docs, .md
    files, etc... affected by this change). Take a look in the docs folder for a given service,
    like this one.

Release planning

  • I've updated the relevant CHANGELOG (here for Horizon) if
    needed with deprecations, added features, breaking changes, and DB schema changes.
  • I've decided if this PR requires a new major/minor version according to
    semver, or if it's mainly a patch change. The PR is targeted at the next
    release branch if it's not a patch change.

What

Adds exp/ingest/io.LedgerEntryChangeCache that squashes all the ledger entry changes. This can be later used to decrease number of DB queries when applying them. See #2003.

Close #2003.

Why

Some ledgers that add a lot of changes connected to a small set of entries are causing a performance issues because every ledger entry change is applied to a DB. LedgerEntryChangeCache solves this problem because it makes holds a final version of a ledger entry after all the changes.

Before this fix, extreme cases when two accounts send a payment between each other 1000 times in a ledger required 3000 DB updates (2000 account changes due to payment and 500 fee meta per account). After the fix, it requires just 2 DB updates.

Algorithm used in LedgerEntryChangeCache is explained in the comment:

// LedgerEntryChangeCache is a cache of ledger entry changes that squashes all
// changes within a single ledger. By doing this, it decreases number of DB
// queries sent to a DB to update the current state of the ledger.
// It has integrity checks built in so ex. removing an account that was
// previously removed returns an error. In such case verify.StateError is
// returned.
//
// It applies changes to the cache using the following algorithm:
//
// 1. If the change is CREATED it checks if any change connected to given entry
//    is already in the cache. If not, it adds CREATED change. Otherwise, if
//    existing change is:
//    a. CREATED it returns error because we can't add an entry that already
//       exists.
//    b. UPDATED it returns error because we can't add an entry that already
//       exists.
//    c. REMOVED it means that due to previous transitions we want to remove
//       this from a DB what means that it already exists in a DB so we need to
//       update the type of change to UPDATED.
// 2. If the change is UPDATE it checks if any change connected to given entry
//    is already in the cache. If not, it adds UPDATE change. Otherwise, if
//    existing change is:
//    a. CREATED it means that due to previous transitions we want to create
//       this in a DB what means that it doesn't exist in a DB so we need to
//       update the entry but stay with CREATED type.
//    b. UPDATED we simply update it with the new value.
//    c. REMOVED it means that at this point in the ledger the entry is removed
//       so updating it returns an error.
// 3. If the change is REMOVE it checks if any change connected to given entry
//    is already in the cache. If not, it adds REMOVE change. Otherwise, if
//    existing change is:
//    a. CREATED it means that due to previous transitions we want to create
//       this in a DB what means that it doesn't exist in a DB. If it was
//       created and removed in the same ledger it's a noop so we remove entry
//       from the cache.
//    b. UPDATED we simply update it to be a REMOVE change because the UPDATE
//       change means the entry exists in a DB.
//    c. REMOVED it returns error because we can't remove an entry that was
//       already removed.

@cla-bot cla-bot bot added the cla: yes label Dec 4, 2019
case xdr.LedgerEntryTypeData:
rowsAffected, err = c.HistoryQ.RemoveAccountData(entry.LedgerKey().MustData())
case xdr.LedgerEntryTypeOffer:
rowsAffected, err = c.HistoryQ.RemoveOffer(entry.LedgerKey().MustOffer().OfferId)
Copy link
Contributor Author

@bartekn bartekn Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we could build BatchRemoveBuilder that would remove rows in batches (DELETE ... WHERE ... IN (...)). This could be helpful when removing many offers after a trade.

@tamirms
Copy link
Contributor

tamirms commented Dec 9, 2019

approach looks good

Copy link
Contributor

@abuiles abuiles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@ire-and-curses ire-and-curses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going in a good direction, I had a few comments/questions mostly for my understanding.

xdr/ledger_key.go Show resolved Hide resolved
// If existing type is removed it means that this entry does exist
// in a DB so we update entry change.
c.cache[ledgerKeyString] = xdr.LedgerEntryChange{
Type: xdr.LedgerEntryChangeTypeLedgerEntryUpdated,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be LedgerEntryChangeTypeLedgerEntryCreated since it is being added for the first time?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If so, also change comment above to in the DB so we add entry change

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The case here is:

  1. Entry exists in the previous ledger.
  2. It was removed in the current ledger (thus removed in the cache).
  3. It is now created so given it exists in the previous ledger we need to update it instead of create it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I think the confusing part about this code is that the previous ledger has a state, and so does the cache (representing current ledger), and they can of course be different. Is there a way to make it clearer which state we're dealing with?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some updates to the code that should make it more clear.

// queries sent to a DB to update the current state of the ledger.
// It has integrity checks built in so ex. removing an account that was
// previously removed returns an error. In such case verify.StateError is
// returned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could add a small section explaining the possible state transitions here. Something like:

Existing Type      Add                      Update                       Remove
EntryCreated       X                        entry exists, can update     entry exists, can remove
EntryUpdated       X                        entry exists, can update     entry exists, can remove
EntryRemoved       no entry, can add        X                            X

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale is to try and help with understanding when we're talking about previous vs current ledger (in the form of the cache). Also because the logic is currently distributed through several functions, and it depends on the XDR state transitions of stellar core, so the truth table is hard to verify.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded the comment to explain exactly how the algorithm looks like.

ledgerKeyString,
))
case xdr.LedgerEntryChangeTypeLedgerEntryRemoved:
// If existing type is removed it means that this entry does exist
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// If existing type is removed it means that this entry does exist
// If existing type is removed it means that this entry does not exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current comment is correct. We want to add create change to the cache and existing change in the cache is removed. This means that the entry exists in the DB/previous ledger (probably received removed before this change).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. I'm sure you're right but this is very confusing. :/ Is there a way to make it clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Check the latest code and let me know if it's still confusing.

Co-Authored-By: Eric Saunders <ire-and-curses@users.noreply.github.com>
@bartekn bartekn changed the base branch from release-horizon-v0.24.0 to release-horizon-v0.25.0 December 10, 2019 13:23
@bartekn bartekn marked this pull request as ready for review December 10, 2019 16:43
@bartekn bartekn changed the base branch from release-horizon-v0.25.0 to release-horizon-v0.24.1 December 10, 2019 16:50
@bartekn
Copy link
Contributor Author

bartekn commented Dec 12, 2019

I checked performance of this solution by replaying ingestion of 26988927-26989439 ledgers range in pubnet (when DRA spam happened).

Before we explore the data it's important to understand that the test was performed on my dev machine so both Horizon and it's DB were on the same server. This means that connection round trip time was minimal. The actual savings this PR introduces are in the round trip times because it makes much smaller number of DB queries.

Screenshot 2019-12-12 at 15 21 30
Screenshot 2019-12-12 at 14 30 51
Duration difference:
Screenshot 2019-12-12 at 14 31 01

Full data can be found in this spreadsheet.

Observations:

  • The ingestion of ledgers with a small number of ops is slightly faster. It's around 0.06 s on average but around 33% faster comparing them to previous code. You can see them in the left side of chart.
  • For ledgers with many ops, the results are much better. Looking at the chart, ingestion was faster by between 0.5-2.2 seconds. Ingesting ledgers with diff >= 0.5 was 45% faster on average.
  • For ledgers that contain mostly DRA spam (ex. 26989221) ingestion can be even 80% faster: pre: 1.382870728 post: 0.269690039 diff: 1.113180689.
  • Ingestion of 16 ledgers (of 512 total = 3%) was actually slower by 0.06 sec. on average (0.28 sec. max). I checked these ledgers and I think that after adding the cache the number of DB queries don't drop significantly. The average increase in ingestion time is negligible in this case in my opinion.

I believe results will be much better in staging where the DB is on a different server.

@ire-and-curses
Copy link
Member

This looks very promising, and reinforces the rationale for doing this. I think we should merge. After that, It would be great to get an unstable version running on the staging server to see real-world numbers with the remote DB server.

Copy link
Member

@ire-and-curses ire-and-curses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Two optional suggestions.

exp/ingest/io/ledger_entry_change_cache.go Outdated Show resolved Hide resolved
exp/ingest/io/ledger_entry_change_cache.go Show resolved Hide resolved
@bartekn bartekn merged commit 6398a58 into stellar:release-horizon-v0.24.1 Dec 12, 2019
@bartekn bartekn deleted the ledger-entry-change-cache branch December 12, 2019 17:06
bartekn added a commit that referenced this pull request Dec 19, 2019
This commit changes `DatabaseProcessor` to insert/update (upsert)
accounts and trust lines in batches.

In #2004 we added `LedgerEntryChangeCache` that aims to decrease number
of DB updates: all the changes connected to a single ledger entry are
squashed into just one DB query. Even though it gives a nice performance
boost when large number of ops change a small number of ledger entries,
it turns out this is not enough. When many ledger entries are changed in
a new ledger, DB connection round trip time takes significant percentage
of time in overall ledger processing time.

The SQL query was borrowed [1] from stellar-core.

Known limitations:

* This code shouldn't be the final code that is released. The aim of
this PR is to deploy it to stg environment to check it's performance.
* This is adding upsert queries for accounts and trust lines only. These
types are the most common changes in the recent ledgers. The final
solution should add batch upsert and batch delete for all ledger
entries.
* There's a potential to improve the code: autogenerating upsert query
for any ledger entry type, better tests. Will be done in a separate PR.
* It removes a code that check a number of rows affected by a query.
Unfortunately, the performance of the current solution is too bad to
keep it.

[1] https://github.com/stellar/stellar-core/blob/21469f90da1eacc6845017a520e179afb3772e65/src/ledger/LedgerTxnAccountSQL.cpp#L306-L336
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants