Implement splitting and encoding `ops`, `nsInfo` as separate `OP_MSG` sections, implement prose tests #1495

stIncMale · 2024-09-09T07:04:55Z

The first four commits are well-organized to allow reviewing them one by one.

This PR depends on #1486.

The following test runners execute the unified and prose tests added in this PR:

com.mongodb.client.CrudProseTest
com.mongodb.client.AbstractClientSideOperationsTimeoutProseTest

JAVA-5529
JAVA-5610
JAVA-5695

JAVA-5527

…nstructor JAVA-5527

JAVA-5527

JAVA-5528

JAVA-5527

…lientWriteModel` subtypes JAVA-5527

JAVA-5527

JAVA-5528

JAVA-5527

JAVA-5528

JAVA-5527

JAVA-5528

…ommandWriteConcern` to `CommandOperationHelper` JAVA-5528

JAVA-5528

Justification: this is the only operation that uses a lazy document for the command. It was only lazy in the first place because of the lack of splitting in the initial implementation, but now that there is splitting there is no longer a need, and the fact that it's an outlier would make it confusing for future readers. JAVA-5529 Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>

This makes `CommandMessage` generic for any command with two sequences. The new abstraction adds: * The sequence identifiers for each sequence. * A field name validator for both sequences. * A `List<BsonElement>`` for any extra elements required by the splitting logic, so that `txnNumber` doesn't have to be treated specially. Make `SplittablePayload` extend `OpMsgSequence`. This brings SplittablePayload closer in design to `DualMessageSequences`, reducing a potential source of confusion for future readers. JAVA-5529 Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>

stIncMale · 2024-10-22T03:17:58Z

@jyemin I incorporated the changes you proposed. See ea0633e, 9fe35e4.

However, I did not include jyemin@7d7f3eb, because as I understand it. CommandMessage.encodeMessageBodyWithMetadata in the main branch is already concerned with getSettings().getMaxDocumentSize(); BsonWriterHelper.getPayloadMessageSettings is concerned with DOCUMENT_HEADROOM; RequestMessage.addDocument creates a BsonBinaryWriter that checks that the document size is not greater than settings.getMaxDocumentSize() + DOCUMENT_HEADROOM. Given that, what is the purpose of moving similar validation checks to the operation layer specifically when it comes to DualMessageSequences?

jyemin · 2024-10-22T13:10:11Z

Given that, what is the purpose of moving similar validation checks to the operation layer specifically when it comes to DualMessageSequences?

I think the difference is that the checks you enumerated apply equally to all commands: no command document can be larger than maxDocumentSize + DOCUMENT_HEADROOM, period. Whereas for bulkWrite there is specific "business logic" that applies only to that command: the check applies to each document in the array, but only if it's a replace or insert, and only if write concern is unacknowledged. That's why it seemed more appropriate to consolidate that logic with the operation. Let me know your thoughts on this.

Also, wanted to check about jyemin@a2805a6, since you didn't mention that commit.

jyemin

Waiting for feedback from Valentin

stIncMale · 2024-11-04T16:27:10Z

@jyemin

Each time I look at this, I not only can't see a problem with the current approach, but even fail to convince myself that the proposed change makes things better overall.

for bulkWrite there is specific "business logic" that applies only to that command: the check applies to each document in the array, but only if it's a replace or insert

The logic that is aware of which documents are stored, and which ones are not, is in ClientBulkWriteOperation. Neither CommandMessage nor BsonWriterHelper are aware of it.

and only if write concern is unacknowledged. That's why it seemed more appropriate to consolidate that logic with the operation.

The logic that is aware of acknowledged writes and of what an application has requested, is in ClientBulkWriteOperation. CommandMessage knows only whether a response needs be requested from the server or not, and it's a reasonable thing to be aware of at this level of abstraction. CommandMessage treats different MessageSequences differently regardless of the document size validation, and treating them differently when it comes to making a decision on whether size validation should be done at all is no different (the logic is: if we have no way to learn about exceeded size from the server, we have to validate the size ourselves).

If for some reason we want CommandMessage to not treat different MessageSequences differently when it comes to size validation, we would have to move the logic of encoding the command document into an operation, like we do when encoding extra elements. This way ClientBulkWriteOperation would be able to decide whether to validate the size of the command document, which can exceed the limit, for example, because of a large comment or let.

JAVA-5529 Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>

driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java

jyemin

Latest commit LGTM.

Not sure if any more are coming in this PR so not approving the PR yet.

JAVA-5529

…ulk write operation JAVA-5695

stIncMale · 2024-11-14T07:23:28Z

@vbabanin This PR is ready for your review.

jyemin

LGTM (again!)

driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java

driver-core/src/main/com/mongodb/internal/connection/BsonWriterHelper.java

driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java

vbabanin · 2024-11-20T01:01:08Z

driver-core/src/main/com/mongodb/internal/connection/IdHoldingBsonWriter.java

Thanks for moving these to constants— makes the code easier to read!

driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java

vbabanin · 2024-11-20T19:35:59Z

driver-core/src/main/com/mongodb/internal/connection/BsonWriterHelper.java

+        BsonBinaryWriter firstWriter = createBsonBinaryWriter(firstOutput, dualMessageSequences.getFirstFieldNameValidator(), null);
+        BsonBinaryWriter secondWriter = createBsonBinaryWriter(secondOutput, dualMessageSequences.getSecondFieldNameValidator(), null);
+        // the size of operation-agnostic command fields (a.k.a. extra elements) is counted towards `messageOverheadInBytes`
+        int messageOverheadInBytes = 1000;


Does this limitation apply only to client bulk write?

I am not sure if limitation is the right word here. But yes, this is unique to client bulk writes:

A mixed bulk write does not mix different kinds of operations in a batch because it is technically impossible there. As a result, we know in advance what kind of operations is in a batch, thus knowing whether it supports retries or not. That allows us to encode extraElements before encoding the PAYLOAD_TYPE_1_DOCUMENT_SEQUENCE section. Thus, when we are encoding the document sequence, we know exactly how much space we have left available before reaching MessageSettings.getMaxMessageSize.

A client bulk write may mix different kinds of operations in a batch, which means that we know whether the batch supports retries only after encoding its PAYLOAD_TYPE_1_DOCUMENT_SEQUENCE sections. That is, we may need to write something after writing those sections. That, in turn, means we can't know exactly how much space we have left when we encode document sequences. But whatever we write after writing the sequences, its size is bounded, and 1000 bytes is used in the spec as the value that is definitely not smaller than that bound.

driver-core/src/main/com/mongodb/internal/operation/ClientBulkWriteOperation.java

…ssage.java Add `this.` to align with the code style in the `CommandMessage` constructor Co-authored-by: Viacheslav Babanin <frest0512@gmail.com>

…geBodyWithMetadata`

stIncMale · 2024-11-22T19:48:09Z

Thank you, @vbabanin, all your refactoring suggestions were really good.

vbabanin

LGTM!

stIncMale added 30 commits July 23, 2024 14:56

Create and document Java sync improved bulk write API

48b9614

JAVA-5527

Remove the type parameter from ClientWriteModel

af854ed

JAVA-5527

Remove ClientBulkWriteException.create as we can get by with the co…

1eb7466

…nstructor JAVA-5527

Merge branch 'master' into JAVA-5527

5567324

Merge branch 'master' into JAVA-5527

644d561

Do minor improvements

f566303

JAVA-5527

Make changes needed for the implementation

3d13ffd

JAVA-5527

Fix formatting in ClientUpdateManyModel

cf46b46

JAVA-5527

Make a few minor changes

e049234

JAVA-5527

Merge branch 'master' into JAVA-5527

0e16427

Add more info to the API docs, add ClientWriteModel subtypes

0d518d8

JAVA-5527

Add ClientWriteModelWithNamespace

6b77f78

JAVA-5527

Implement

2ed8e87

JAVA-5528

Sync spec tests

07bac95

JAVA-5528

Implement required test runner changes

16f100b

JAVA-5528

Improve how indexedNamespaces are computed

9f8ce2c

JAVA-5528

Remove throws declarations from the API

fdb90d2

JAVA-5527

Make wording on ClientWriteModel methods consistent with that on `C…

39386fe

…lientWriteModel` subtypes JAVA-5527

Move constructor methods to ClientNamespacedWriteModel

f67af3f

JAVA-5527

Merge branch 'JAVA-5527' into JAVA-5528

53f6883

Fix errors caused by the merge

9a7b668

JAVA-5528

Use Optional to express verbose/summary results

395af7a

JAVA-5527

Merge branch 'JAVA-5527' into JAVA-5528

8f504b0

Fix errors caused by the merge

b02a638

JAVA-5528

Make an API doc improvement

a11e5f6

JAVA-5527

Refactor shouldAttemptToRetryWriteAndAddRetryableLabel

bfbc1cc

JAVA-5528

Improve CrudProseTest.insertMustGenerateIdAtMostOnce

1c3c590

JAVA-5528

Move MixedBulkWriteOperation.validateAndGetEffectiveWriteConcern/`c…

1c9c19e

…ommandWriteConcern` to `CommandOperationHelper` JAVA-5528

Add a comment in toClientNamespacedWriteModel

061e605

JAVA-5528

Use CommandResultDocumentCodec in ClientBulkWriteOperation

8320216

JAVA-5528

stIncMale and others added 2 commits October 21, 2024 08:14

stIncMale force-pushed the JAVA-5529 branch from b8734d1 to 9fe35e4 Compare October 22, 2024 03:08

stIncMale requested a review from jyemin October 22, 2024 03:18

jyemin reviewed Oct 23, 2024

View reviewed changes

stIncMale requested a review from jyemin November 4, 2024 16:30

Get rid of OrdinaryAndStoredBsonWriters

056d411

JAVA-5529 Co-authored-by: Jeff Yemin <jeff.yemin@mongodb.com>

stIncMale commented Nov 5, 2024

View reviewed changes

driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java Outdated Show resolved Hide resolved

jyemin reviewed Nov 5, 2024

View reviewed changes

Add CommandMessageTest.getCommandDocumentFromClientBulkWrite

6f68c0e

JAVA-5529

jyemin self-requested a review November 12, 2024 14:01

Remove the BSON document size validation requirement for the client b…

ee3073e

…ulk write operation JAVA-5695

jyemin approved these changes Nov 14, 2024

View reviewed changes

vbabanin reviewed Nov 20, 2024

View reviewed changes

driver-core/src/main/com/mongodb/internal/operation/ClientBulkWriteOperation.java Show resolved Hide resolved

stIncMale and others added 5 commits November 21, 2024 12:39

Update driver-core/src/main/com/mongodb/internal/connection/CommandMe…

0e78b67

…ssage.java Add `this.` to align with the code style in the `CommandMessage` constructor Co-authored-by: Viacheslav Babanin <frest0512@gmail.com>

Address simple review concerns

38c880d

Extract writeOpMsg, writeOpQuery from `CommandMessage.encodeMessa…

0156135

…geBodyWithMetadata`

Refactor CommandMessage

f07ff6f

Refactor BsonWriterHelper.appendElementsToDocument

3c6c09f

stIncMale requested a review from vbabanin November 22, 2024 19:47

vbabanin approved these changes Nov 26, 2024

View reviewed changes

stIncMale merged commit b0e2bdf into mongodb:JAVA-4586_bulk-write Nov 27, 2024
54 of 59 checks passed

stIncMale deleted the JAVA-5529 branch November 27, 2024 17:30

stIncMale mentioned this pull request Nov 27, 2024

Improved Bulk Write API #1509

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement splitting and encoding `ops`, `nsInfo` as separate `OP_MSG` sections, implement prose tests #1495

Implement splitting and encoding `ops`, `nsInfo` as separate `OP_MSG` sections, implement prose tests #1495

stIncMale commented Sep 9, 2024 •

edited

Loading

stIncMale commented Oct 22, 2024 •

edited by jyemin

Loading

jyemin commented Oct 22, 2024 •

edited

Loading

jyemin left a comment

stIncMale commented Nov 4, 2024 •

edited

Loading

jyemin left a comment

stIncMale commented Nov 14, 2024

jyemin left a comment

vbabanin Nov 20, 2024

vbabanin Nov 20, 2024

stIncMale Nov 21, 2024

stIncMale commented Nov 22, 2024

vbabanin left a comment •

edited

Loading

Implement splitting and encoding ops, nsInfo as separate OP_MSG sections, implement prose tests #1495

Implement splitting and encoding ops, nsInfo as separate OP_MSG sections, implement prose tests #1495

Conversation

stIncMale commented Sep 9, 2024 • edited Loading

stIncMale commented Oct 22, 2024 • edited by jyemin Loading

jyemin commented Oct 22, 2024 • edited Loading

jyemin left a comment

Choose a reason for hiding this comment

stIncMale commented Nov 4, 2024 • edited Loading

jyemin left a comment

Choose a reason for hiding this comment

stIncMale commented Nov 14, 2024

jyemin left a comment

Choose a reason for hiding this comment

vbabanin Nov 20, 2024

Choose a reason for hiding this comment

vbabanin Nov 20, 2024

Choose a reason for hiding this comment

stIncMale Nov 21, 2024

Choose a reason for hiding this comment

stIncMale commented Nov 22, 2024

vbabanin left a comment • edited Loading

Choose a reason for hiding this comment

Implement splitting and encoding `ops`, `nsInfo` as separate `OP_MSG` sections, implement prose tests #1495

Implement splitting and encoding `ops`, `nsInfo` as separate `OP_MSG` sections, implement prose tests #1495

stIncMale commented Sep 9, 2024 •

edited

Loading

stIncMale commented Oct 22, 2024 •

edited by jyemin

Loading

jyemin commented Oct 22, 2024 •

edited

Loading

stIncMale commented Nov 4, 2024 •

edited

Loading

vbabanin left a comment •

edited

Loading