Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement splitting and encoding ops, nsInfo as separate OP_MSG sections, implement prose tests #1495

Merged
merged 104 commits into from
Nov 27, 2024

Conversation

stIncMale
Copy link
Member

@stIncMale stIncMale commented Sep 9, 2024

The first four commits are well-organized to allow reviewing them one by one.

This PR depends on #1486.

The following test runners execute the unified and prose tests added in this PR:

  • com.mongodb.client.CrudProseTest
  • com.mongodb.client.AbstractClientSideOperationsTimeoutProseTest

JAVA-5529
JAVA-5610
JAVA-5695

stIncMale added 30 commits July 23, 2024 14:56
JAVA-5528
JAVA-5528
…ommandWriteConcern` to `CommandOperationHelper`

JAVA-5528
stIncMale and others added 2 commits October 21, 2024 08:14
Justification: this is the only operation that uses a lazy document for the command.  It was only lazy in the first place because of the lack of splitting in the initial implementation, but now that there is splitting there is no longer a need, and the fact
that it's an outlier would make it confusing for future readers.

JAVA-5529

Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>
This makes `CommandMessage` generic for any command with two sequences. The new abstraction adds:

* The sequence identifiers for each sequence.
* A field name validator for both sequences.
* A `List<BsonElement>`` for any extra elements required by the splitting logic, so that `txnNumber` doesn't
  have to be treated specially.

Make `SplittablePayload` extend `OpMsgSequence`.
This brings SplittablePayload closer in design to `DualMessageSequences`,
reducing a potential source of confusion for future readers.

JAVA-5529

Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>
@stIncMale
Copy link
Member Author

stIncMale commented Oct 22, 2024

@jyemin I incorporated the changes you proposed. See ea0633e, 9fe35e4.

However, I did not include jyemin@7d7f3eb, because as I understand it. CommandMessage.encodeMessageBodyWithMetadata in the main branch is already concerned with getSettings().getMaxDocumentSize(); BsonWriterHelper.getPayloadMessageSettings is concerned with DOCUMENT_HEADROOM; RequestMessage.addDocument creates a BsonBinaryWriter that checks that the document size is not greater than settings.getMaxDocumentSize() + DOCUMENT_HEADROOM. Given that, what is the purpose of moving similar validation checks to the operation layer specifically when it comes to DualMessageSequences?

@stIncMale stIncMale requested a review from jyemin October 22, 2024 03:18
@jyemin
Copy link
Contributor

jyemin commented Oct 22, 2024

Given that, what is the purpose of moving similar validation checks to the operation layer specifically when it comes to DualMessageSequences?

I think the difference is that the checks you enumerated apply equally to all commands: no command document can be larger than maxDocumentSize + DOCUMENT_HEADROOM, period. Whereas for bulkWrite there is specific "business logic" that applies only to that command: the check applies to each document in the array, but only if it's a replace or insert, and only if write concern is unacknowledged. That's why it seemed more appropriate to consolidate that logic with the operation. Let me know your thoughts on this.

Also, wanted to check about jyemin@a2805a6, since you didn't mention that commit.

Copy link
Contributor

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Waiting for feedback from Valentin

@stIncMale
Copy link
Member Author

stIncMale commented Nov 4, 2024

@jyemin

Each time I look at this, I not only can't see a problem with the current approach, but even fail to convince myself that the proposed change makes things better overall.

for bulkWrite there is specific "business logic" that applies only to that command: the check applies to each document in the array, but only if it's a replace or insert

The logic that is aware of which documents are stored, and which ones are not, is in ClientBulkWriteOperation. Neither CommandMessage nor BsonWriterHelper are aware of it.

and only if write concern is unacknowledged. That's why it seemed more appropriate to consolidate that logic with the operation.

The logic that is aware of acknowledged writes and of what an application has requested, is in ClientBulkWriteOperation. CommandMessage knows only whether a response needs be requested from the server or not, and it's a reasonable thing to be aware of at this level of abstraction. CommandMessage treats different MessageSequences differently regardless of the document size validation, and treating them differently when it comes to making a decision on whether size validation should be done at all is no different (the logic is: if we have no way to learn about exceeded size from the server, we have to validate the size ourselves).

If for some reason we want CommandMessage to not treat different MessageSequences differently when it comes to size validation, we would have to move the logic of encoding the command document into an operation, like we do when encoding extra elements. This way ClientBulkWriteOperation would be able to decide whether to validate the size of the command document, which can exceed the limit, for example, because of a large comment or let.

@stIncMale stIncMale requested a review from jyemin November 4, 2024 16:30
JAVA-5529

Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>
Copy link
Contributor

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest commit LGTM.

Not sure if any more are coming in this PR so not approving the PR yet.

@jyemin jyemin self-requested a review November 12, 2024 14:01
@stIncMale
Copy link
Member Author

@vbabanin This PR is ready for your review.

Copy link
Contributor

@jyemin jyemin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (again!)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for moving these to constants— makes the code easier to read!

BsonBinaryWriter firstWriter = createBsonBinaryWriter(firstOutput, dualMessageSequences.getFirstFieldNameValidator(), null);
BsonBinaryWriter secondWriter = createBsonBinaryWriter(secondOutput, dualMessageSequences.getSecondFieldNameValidator(), null);
// the size of operation-agnostic command fields (a.k.a. extra elements) is counted towards `messageOverheadInBytes`
int messageOverheadInBytes = 1000;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this limitation apply only to client bulk write?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if limitation is the right word here. But yes, this is unique to client bulk writes:

  1. A mixed bulk write does not mix different kinds of operations in a batch because it is technically impossible there. As a result, we know in advance what kind of operations is in a batch, thus knowing whether it supports retries or not. That allows us to encode extraElements before encoding the PAYLOAD_TYPE_1_DOCUMENT_SEQUENCE section. Thus, when we are encoding the document sequence, we know exactly how much space we have left available before reaching MessageSettings.getMaxMessageSize.
  2. A client bulk write may mix different kinds of operations in a batch, which means that we know whether the batch supports retries only after encoding its PAYLOAD_TYPE_1_DOCUMENT_SEQUENCE sections. That is, we may need to write something after writing those sections. That, in turn, means we can't know exactly how much space we have left when we encode document sequences. But whatever we write after writing the sequences, its size is bounded, and 1000 bytes is used in the spec as the value that is definitely not smaller than that bound.

stIncMale and others added 5 commits November 21, 2024 12:39
@stIncMale stIncMale requested a review from vbabanin November 22, 2024 19:47
@stIncMale
Copy link
Member Author

Thank you, @vbabanin, all your refactoring suggestions were really good.

Copy link
Member

@vbabanin vbabanin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@stIncMale stIncMale merged commit b0e2bdf into mongodb:JAVA-4586_bulk-write Nov 27, 2024
54 of 59 checks passed
@stIncMale stIncMale deleted the JAVA-5529 branch November 27, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants