Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Client-side timestamping #165

Merged
merged 1 commit into from
Jan 27, 2022

Conversation

mwalkiewicz
Copy link
Collaborator

@mwalkiewicz mwalkiewicz commented Jan 21, 2022

This change is Reviewable

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 38 files reviewed, 7 unresolved discussions (waiting on @dopiera and @mwalkiewicz)


quickstart.md, line 135 at r1 (raw file):

## Client-side timestamping
HBase and Bigtable assign row version (timestamp) based on server-side time for mutations without version assigned by the client. The Mirroring Client issues writes to underlying databases a few milliseconds apart and performing mutations without version assigned on the client side will cause inconsistencies between databases. To mitigate some of those issues a client-side timestamping is available in the Mirroring Client. When client-side timestamping is enabled the Mirroring Client will automatically add a timestamp based on client's local time to every `Put` object passed to the Mirroring Client. Client-side timestamps assigned by `Table`s and `BufferedMutator`s created by one `Connection` are always increasing, even if system clock is moved backwards, for example by NTP or manually by the user.

That probably goes without saying, but I think we should mention that this concerns only the mutations which do not have that timestamp explicitly set by the user.

Code quote:

to every `Put` object passed to the Mirroring Client

quickstart.md, line 164 at r1 (raw file):

## Caveats
### Timestamps
Be aware that client-side timestamping modifies only `Put`s - `Delete`s, `Increment`s and `Append`s are not affected by this setting and will cause inconsistencies between databases.

Can you also add a sentence about the consequences of the time differences between the client servers?


quickstart.md, line 137 at r2 (raw file):

HBase and Bigtable assign row version (timestamp) based on server-side time for mutations without version assigned by the client. The Mirroring Client issues writes to underlying databases a few milliseconds apart and performing mutations without version assigned on the client side will cause inconsistencies between databases. To mitigate some of those issues a client-side timestamping is available in the Mirroring Client. When client-side timestamping is enabled the Mirroring Client will automatically add a timestamp based on client's local time to every `Put` object passed to the Mirroring Client. Client-side timestamps assigned by `Table`s and `BufferedMutator`s created by one `Connection` are always increasing, even if system clock is moved backwards, for example by NTP or manually by the user.
Be aware that client-side timestamping modifies only `Put`s - `Delete`s, `Increment`s and `Append`s are not affected by this setting and will cause inconsistencies between databases.
To enable client-side timestamping set `google.bigtable.mirroring.enable-default-client-side-timestamps` to `true`.

As discussed F2F, let's make it a string with 3 valid values: "disabled", "inplace" and "copy".


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/MirroringConnection.java, line 95 at r1 (raw file):

  protected final boolean waitForSecondaryWrites;
  protected final Timestamper timestamper;

Could you please add a description the all the timestamper fields along these lines:

// Depending on configuration, ensures that mutations have an explicitly set timestamp.

bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/MirroringOptions.java, line 160 at r1 (raw file):

    this.enableDefaultClientSideTimestamps =
        configuration.getBoolean(MIRRORING_ENABLE_DEFAULT_CLIENT_SIDE_TIMESTAMPS, false);

I think we should set this to inplace. We need to pick a less bad value and I'd make the judgement call that time drifting on the servers will be a smaller issue than diverging databases.


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/timestamper/MonotonicTimestamper.java, line 80 at r2 (raw file):

          }
        } catch (IOException e) {
          // IOException is thrown when `cell` does not implement `SettableTimestamp` and if it

assert false;?


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/timestamper/Timestamper.java, line 23 at r2 (raw file):

import org.apache.hadoop.hbase.client.RowMutations;

public interface Timestamper {

Please add a description of what it is

@mwalkiewicz mwalkiewicz force-pushed the mw/client-side-timestamps/1 branch 4 times, most recently from 0b0e4bd to 7ce4e20 Compare January 26, 2022 15:55
Copy link
Collaborator Author

@mwalkiewicz mwalkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 0 of 45 files reviewed, 7 unresolved discussions (waiting on @dopiera and @mwalkiewicz)


quickstart.md, line 164 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Can you also add a sentence about the consequences of the time differences between the client servers?

Done.


quickstart.md, line 137 at r2 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

As discussed F2F, let's make it a string with 3 valid values: "disabled", "inplace" and "copy".

Done.


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/MirroringConnection.java, line 95 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Could you please add a description the all the timestamper fields along these lines:

// Depending on configuration, ensures that mutations have an explicitly set timestamp.

Done.


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/MirroringOptions.java, line 160 at r1 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

I think we should set this to inplace. We need to pick a less bad value and I'd make the judgement call that time drifting on the servers will be a smaller issue than diverging databases.

Done.


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/timestamper/MonotonicTimestamper.java, line 80 at r2 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

assert false;?

Done. I'll throw a new RuntimeException(e) because assert would be rejected in the review.

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 9 of 38 files at r1, 2 of 5 files at r2, 27 of 32 files at r3, 5 of 7 files at r4, all commit messages.
Reviewable status: 43 of 45 files reviewed, 4 unresolved discussions (waiting on @dopiera and @mwalkiewicz)


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/MirroringConfigurationHelper.java, line 287 at r4 (raw file):

  /**
   * Enables timestamping {@link org.apache.hadoop.hbase.client.Put}s without timestamp set based on
   * client's host local time. Client-side timestamps assigned by {@link Table}s and {@link

Nit: local time is the time you see on your clock, i.e. time zone is needed to interpret it. Timestamp is the duration since epoch in UTC, hence it's not local.


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/timestamper/Timestamper.java, line 23 at r2 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Please add a description of what it is

Ping ;)

Copy link
Collaborator Author

@mwalkiewicz mwalkiewicz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 39 of 45 files reviewed, 3 unresolved discussions (waiting on @dopiera)


bigtable-hbase-mirroring-client-1.x-parent/bigtable-hbase-mirroring-client-1.x/src/main/java/com/google/cloud/bigtable/mirroring/hbase1_x/utils/timestamper/Timestamper.java, line 23 at r2 (raw file):

Previously, dopiera (Marek Dopiera) wrote…

Ping ;)

Done, sorry.

Copy link

@dopiera dopiera left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: 39 of 45 files reviewed, all discussions resolved (waiting on @dopiera)

@mwalkiewicz mwalkiewicz merged commit ffb6157 into md/handover_fixes Jan 27, 2022
dopiera pushed a commit that referenced this pull request May 11, 2022
dopiera pushed a commit that referenced this pull request May 11, 2022
dopiera pushed a commit that referenced this pull request May 11, 2022
dopiera added a commit that referenced this pull request May 11, 2022
… review comments (googleapis#3347)

* chore: revert review comments

* feat: add MirroringOperationException exception markers (#125)

* feat: concurrent writes in MirroringBufferedMutator (#80)

* refactor: implement multiple argument operations on MirroringAsyncTable with specific operations rather than batch() (#75)

* feat: implement MirroringAsyncTable#getName() (#132)

* feat: use Logger rather than stdout in DefaultMismatchDetector (#128)

* feat: synchronous writes (#88)

* fix: implement heapSize method for RowCell (#111)

* feat: FlowController accounts for memory usage (#137)

* refactor: remove Configuration as a base of MirroringConfiguration (#127)

* feat: MirroringAsyncBufferedMutator (#81)

* refactor: rename WRITE_MISMATCH to SECONDARY_WRITE_ERROR (#138)

* fix: BufferedMutator close() waits for all secondary flushes to finish (#110)

* feat: 2.x reads sampling (#114)

* refactor: make MirroringResultScanner synchronize on itself rather than MirroringTable (#134)

* ConcurrentBufferedMutator integration tests (#135)

* feat: add synchronous MirroringConnection to 2.x (#109)

* fix: MirroringConnection in 2.x failed to compile (#139)

* fix: fix BufferedMutator ITs (#140)

* feat: run 1.x integration tests on MirroringConnection etc. from 2.x (#108)

* feat: 2.x - rewrite Increment and Append as Put in batch (#116)

* fix: fix build (#142)

* refactor: minor fixes after review (#117)

* feat: MirroringAsyncTable#getScanner() (#58)

* test: 2.x integration tests (#112)

* feat: implement MirroringAsyncBufferedMutatorBuilder (#144)

* feat: log rows and values in DefaultMismatchDetector (#129)

* fix: ITs - add expected parameter to MismatchDetectors (#153)

* fix: force Append and Increment to return results and discard that result before returning it to user (#136)

* fix: review fixes in utils

* fix: review fixes in BufferedMutator

* fix: review fixes in Faillog

* fix: fixed reference counting

* fix: review fixes in FlowController

* fix: review fixes in metrics

* fix: review fixes in verification

* fix: Review fixes in MirroringTable

* fix: review fixes in HBase 2.x client

* fix: fixes in ITs

* feat: MirrorinAsyncTable: scan(), scanAll() (#131)

* fix: review fixes in tests

* feat: MirroringConnection: timeout in close() and abort() (#133)

* feat: better mismatch detection of scan results (#130)

* feat: quickstart (#105)

* fix: 2.x scan ITs (#158)

* fix: DefaultMismatchDetector tests (#157)

* fix: ConcurrentBufferedMutator waits for both flushes to finish before closing (#161)

* fix: additional minor fixes after review (#163)

* fix: BufferedMutator review fixes (#164)

- Simplify #flush().
- Add javadocs.
- (sequential) Fix flush() exception handling.
- (sequential) Move error handling to a separate inner class.

* fix: PR fixes

* fix: report zeroed error metrics after successful operations

* fix: prepend MismatchDetectorCounter with Test to better reflect its purpose

* feat: Client-side timestamping (#165)

* fix: reduce timeout in TestBlocking to make the tests run faster

* fix: asyncClose -> closePrimaryAndScheduleSecondaryClose

* fix: remove unused Batcher#throwBatchDataExceptionIfPresent

* fix: remove unused Comparators#compareRows

* fix: extract failedReads from MatchingSuccessfulReadsResults to reduce confusion

* feat: remove unused MirroringTracer from FailedMutationLogger

* fix: MirroringAsyncBufferedMutator - test if failed mutation is passed to secondary write error consumer

* fix: TestMirroringAsyncTableInputModification typo fix

* fix: describe user flush() in Buffered Mutator in quickstart

* fix: MirroringBufferedMutator - move flush threshold from BufferedMutations to FlushSerializer

* refactor: MirroringBufferedMutator#close() - use AccumulatedExceptions insteand of List<Exception>

* BufferedMutator - add close timeout

* AsyncBufferedMutator - add close timeout

* fix: remove stale addSecondaryMutation comment

* fix: add a comment that addSecondaryMutation handles failed writes

* fix: unify implementations of flushBufferedMutatorBeforeClosing

* fix: BufferedMutator - throw exceptions on close

* fix: BufferedMutator - add comment explaining that chain of flush operations is created

* fix: BufferedMutator - clarify  comments

* fix: Concurrent BufferedMutator - fix throwFlushExceptionIfAvailable

* fix: explain why flush is called in Sequential BufferedMutator test

* fix: TestConcurrentMirroringBufferedMutator - make waiting for calls explicit

* refactor: BufferedMutator rename scheduleFlushAll() to scheduleFlush()

* refactor: make FlushSerializer non-static

* fix: BufferedMutator - use HierarchicalReferenceCounter

* feat: Add MirroringConnection constructor taking MirroringConfiguration

* refactor: move releaseReservations to finally

* fix: use less convoluted example in lastFlushFutures description

* fix: merge small Timeestamper files into a single file

* fix: add a comment explaining which exceptions are forwarded to the user and why in SequentialMirroringBufferedMutator

* fix: use UnsupportedOperationException instead of RuntimeException when forbidden mutation type is encountered

* fix: add comment explaining why batch is complicated

* fix: add a TODO to implement point writes without batch

Co-authored-by: Mateusz Walkiewicz <mwalkiewicz@unoperate.com>
Co-authored-by: Adam Czajkowski <prawilny@unoperate.com>
Co-authored-by: Kajetan Boroszko <kajetan@unoperate.com>
mwalkiewicz added a commit that referenced this pull request May 18, 2022
… review comments (googleapis#3347)

* chore: revert review comments

* feat: add MirroringOperationException exception markers (#125)

* feat: concurrent writes in MirroringBufferedMutator (#80)

* refactor: implement multiple argument operations on MirroringAsyncTable with specific operations rather than batch() (#75)

* feat: implement MirroringAsyncTable#getName() (#132)

* feat: use Logger rather than stdout in DefaultMismatchDetector (#128)

* feat: synchronous writes (#88)

* fix: implement heapSize method for RowCell (#111)

* feat: FlowController accounts for memory usage (#137)

* refactor: remove Configuration as a base of MirroringConfiguration (#127)

* feat: MirroringAsyncBufferedMutator (#81)

* refactor: rename WRITE_MISMATCH to SECONDARY_WRITE_ERROR (#138)

* fix: BufferedMutator close() waits for all secondary flushes to finish (#110)

* feat: 2.x reads sampling (#114)

* refactor: make MirroringResultScanner synchronize on itself rather than MirroringTable (#134)

* ConcurrentBufferedMutator integration tests (#135)

* feat: add synchronous MirroringConnection to 2.x (#109)

* fix: MirroringConnection in 2.x failed to compile (#139)

* fix: fix BufferedMutator ITs (#140)

* feat: run 1.x integration tests on MirroringConnection etc. from 2.x (#108)

* feat: 2.x - rewrite Increment and Append as Put in batch (#116)

* fix: fix build (#142)

* refactor: minor fixes after review (#117)

* feat: MirroringAsyncTable#getScanner() (#58)

* test: 2.x integration tests (#112)

* feat: implement MirroringAsyncBufferedMutatorBuilder (#144)

* feat: log rows and values in DefaultMismatchDetector (#129)

* fix: ITs - add expected parameter to MismatchDetectors (#153)

* fix: force Append and Increment to return results and discard that result before returning it to user (#136)

* fix: review fixes in utils

* fix: review fixes in BufferedMutator

* fix: review fixes in Faillog

* fix: fixed reference counting

* fix: review fixes in FlowController

* fix: review fixes in metrics

* fix: review fixes in verification

* fix: Review fixes in MirroringTable

* fix: review fixes in HBase 2.x client

* fix: fixes in ITs

* feat: MirrorinAsyncTable: scan(), scanAll() (#131)

* fix: review fixes in tests

* feat: MirroringConnection: timeout in close() and abort() (#133)

* feat: better mismatch detection of scan results (#130)

* feat: quickstart (#105)

* fix: 2.x scan ITs (#158)

* fix: DefaultMismatchDetector tests (#157)

* fix: ConcurrentBufferedMutator waits for both flushes to finish before closing (#161)

* fix: additional minor fixes after review (#163)

* fix: BufferedMutator review fixes (#164)

- Simplify #flush().
- Add javadocs.
- (sequential) Fix flush() exception handling.
- (sequential) Move error handling to a separate inner class.

* fix: PR fixes

* fix: report zeroed error metrics after successful operations

* fix: prepend MismatchDetectorCounter with Test to better reflect its purpose

* feat: Client-side timestamping (#165)

* fix: reduce timeout in TestBlocking to make the tests run faster

* fix: asyncClose -> closePrimaryAndScheduleSecondaryClose

* fix: remove unused Batcher#throwBatchDataExceptionIfPresent

* fix: remove unused Comparators#compareRows

* fix: extract failedReads from MatchingSuccessfulReadsResults to reduce confusion

* feat: remove unused MirroringTracer from FailedMutationLogger

* fix: MirroringAsyncBufferedMutator - test if failed mutation is passed to secondary write error consumer

* fix: TestMirroringAsyncTableInputModification typo fix

* fix: describe user flush() in Buffered Mutator in quickstart

* fix: MirroringBufferedMutator - move flush threshold from BufferedMutations to FlushSerializer

* refactor: MirroringBufferedMutator#close() - use AccumulatedExceptions insteand of List<Exception>

* BufferedMutator - add close timeout

* AsyncBufferedMutator - add close timeout

* fix: remove stale addSecondaryMutation comment

* fix: add a comment that addSecondaryMutation handles failed writes

* fix: unify implementations of flushBufferedMutatorBeforeClosing

* fix: BufferedMutator - throw exceptions on close

* fix: BufferedMutator - add comment explaining that chain of flush operations is created

* fix: BufferedMutator - clarify  comments

* fix: Concurrent BufferedMutator - fix throwFlushExceptionIfAvailable

* fix: explain why flush is called in Sequential BufferedMutator test

* fix: TestConcurrentMirroringBufferedMutator - make waiting for calls explicit

* refactor: BufferedMutator rename scheduleFlushAll() to scheduleFlush()

* refactor: make FlushSerializer non-static

* fix: BufferedMutator - use HierarchicalReferenceCounter

* feat: Add MirroringConnection constructor taking MirroringConfiguration

* refactor: move releaseReservations to finally

* fix: use less convoluted example in lastFlushFutures description

* fix: merge small Timeestamper files into a single file

* fix: add a comment explaining which exceptions are forwarded to the user and why in SequentialMirroringBufferedMutator

* fix: use UnsupportedOperationException instead of RuntimeException when forbidden mutation type is encountered

* fix: add comment explaining why batch is complicated

* fix: add a TODO to implement point writes without batch

Co-authored-by: Mateusz Walkiewicz <mwalkiewicz@unoperate.com>
Co-authored-by: Adam Czajkowski <prawilny@unoperate.com>
Co-authored-by: Kajetan Boroszko <kajetan@unoperate.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants