Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash submitters in the deduplication key [DPP-347] #9417

Merged
merged 14 commits into from
Apr 20, 2021

Conversation

kamil-da
Copy link
Contributor

@kamil-da kamil-da commented Apr 14, 2021

Motivation

Submitting a command with a large number of submitters results in:

org.postgresql.util.PSQLException: ERROR: index row size 6512 exceeds maximum 2712 for index "participant_command_submissions_pkey"
  Hint: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

due to how we construct deduplication keys. Concatenating a large number of submitters causes exceeding Postgres limits.

Main changes

Made the deduplication key containing a hash of submitters instead of a concatenated string of submitters.

Pull Request Checklist

  • Read and understand the contribution guidelines
  • Include appropriate tests
  • Set a descriptive title and thorough description
  • Add a reference to the issue this PR will solve, if appropriate
  • Include changelog additions in one or more commit message bodies between the CHANGELOG_BEGIN and CHANGELOG_END tags
  • Normal production system change, include purpose of change in description

NOTE: CI is not automatically run on non-members pull-requests for security
reasons. The reviewer will have to comment with /AzurePipelines run to
trigger the build.

@kamil-da kamil-da marked this pull request as ready for review April 15, 2021 15:09
Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any change in com.daml.platform.store.dao needs to be mirrored in com.daml.platform.store.appendonlydao, can you please edit JdbcLedgerDao there accordingly?

.update(
_.commands.deduplicationTime := deduplicationTime.asProtobuf
)
_ <- ledger.submit(request)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this test assert that the dummy contract was successfully created?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed 👍

Comment on lines 17 to 20
/** Postgres has a limit on the index row size of 2712.
* We test that a large submitters number doesn't cause deduplication failure because of that limit.
* THIS TEST CASE DOES NOT TEST ANY OTHER FAILURES THAT MAY BE CAUSED BY A LARGE SUBMITTERS NUMBER
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This description is not accurate:

  • The test uses the ledger API and not the internals of the indexer, so you don't really know what kind of failures it tests for.
  • The description mentions a limit of Postgres, but you don't know whether the test runs against a Postgres index database.
  • You are not checking any property of command deduplication, you are just submitting a request with a large number of parties. I would name the test "Submit commands with a large number of submitters".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rautenrieth-da Thank you for the review!
I've modified the test suite according to your suggestions. I've also added a change to a database index that was blocking submissions due to index row size limits.

Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments. You still need to do the change in com.daml.platform.store.appendonlydao though.

Comment on lines 42 to 43
_ <- ledger.submit(request)
_ <- Delayed.by(1.second)(())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ <- ledger.submit(request)
_ <- Delayed.by(1.second)(())
_ <- ledger.submitAndWait(request)

commands = DummyWithAnnotation(parties.head, "First submission").create.command,
)
.update(
_.commands.deduplicationTime := deduplicationTime.asProtobuf
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need to set a custom deduplication time?

@kamil-da kamil-da force-pushed the kamil-da/DPP-347-hash-deduplication-key branch from f79025e to 368b6d6 Compare April 19, 2021 18:09
@kamil-da
Copy link
Contributor Author

@rautenrieth-da I've adapted the PR according to your review and the discussion during the meeting.

  • implemented the feature for appendonlydao package
  • removed changes to the database index - will come in a follow-up PR
  • removed ValueLimitsIT suite - will come in a follow-up PR

For this PR I've replaced API-level testing with simple unit testing because it's currently impossible to make API-level tests pass without making changes to database indexes.

Copy link
Contributor

@rautenrieth-da rautenrieth-da left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for deduplicating the deduplication logic 🙂


"make a deduplication key with a limited length for a large number of submitters" in {
val submitters = (1 to 50).map(_ => aParty()).toList
val MaxKeyLength = 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constant seems arbitrary, but the value is fine. It's smaller than the maximum length for indexed string columns in both Postgres and Oracle (which was the reason for this change). Maybe add a comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I'll comment on motivation for this limit.

@kamil-da kamil-da merged commit e335244 into main Apr 20, 2021
@kamil-da kamil-da deleted the kamil-da/DPP-347-hash-deduplication-key branch April 20, 2021 11:25
azure-pipelines bot pushed a commit that referenced this pull request Apr 21, 2021
This PR has been created by a script, which is not very smart
and does not have all the context. Please do double-check that
the version prefix is correct before merging.

@remyhaemmerle-da is in charge of this release.

Commit log:
```
cf2be78 Support rollback nodes in BlindingInfo (#9445)
adde90b KV: do not use "Effects" part of InputsAndEffects (#9433)
75fa86a Additional metrics for mutable contract state cache (#9444)
d8c34a4 Rename normalization-related params in the integrity checker config (#9455)
e335244 Hash submitters in the deduplication key [DPP-347] (#9417)
43ffa42 Test for duplicate contracts when querying on behalf of multiple parties (#9443)
7644844 Document daml ledger export script (#9446)
3193027 DPP-334 Manually partition the event table (#9394)
14661a8 Clearer variable name for subtype evidence (#9442)
40c85d8 Release v1.13.0-snapshot.20210419.6730.0.8c3a8c04 (#9439)
1cb907c KV: do not use "Inputs" part of InputsAndEffects (#9429)
```
Changelog:
```
- [Integration Kit] - deduplication key will contain hashed submitters instead of concatenated submitters
* [Daml export] Refer to the "Ledger Export" chapter under the "Early
  Access Features" for a description of the new Daml ledger export
  command. This is an early access feature.
```

CHANGELOG_BEGIN
CHANGELOG_END
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants