Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

Detect duplicate data from TiKV #1144

Merged
merged 26 commits into from
Aug 3, 2021
Merged

Conversation

Little-Wallace
Copy link
Collaborator

What problem does this PR solve?

part of #1110

When there are multiple lightning instance importing data to TiKV together, the data of them may be duplicate.

What is changed and how it works?

We use different commit timestamps to distinguish the data imported by different lightning ( see details in #1101) . After import jobs all end, we will scan all data of TiKV to collect duplicate data.

Check List

Tests

  • Unit test
  • Integration test

Code changes

  • Has exported function/method change
  • Has exported variable/fields change
  • Has interface methods change
  • Has persistent data change

Side effects

  • Possible performance regression
  • Increased code complexity
  • Breaking backward compatibility

Related changes

  • Need to cherry-pick to the release branch
  • Need to update the documentation

Release note

  • No release note.

@ti-chi-bot
Copy link
Member

ti-chi-bot commented May 25, 2021

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • gozssky
  • kennytm

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@CLAassistant
Copy link

CLAassistant commented May 25, 2021

CLA assistant check
All committers have signed the CLA.

go.mod1 Outdated Show resolved Hide resolved
Little-Wallace and others added 11 commits July 14, 2021 12:19
commit 1ed2b5b
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Wed Jul 14 12:11:19 2021 +0800

    fix kvproto

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit 3187304
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Tue Jun 8 16:27:42 2021 +0800

    update getValues to public

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit b8a36b4
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Tue Jun 8 16:12:25 2021 +0800

    refactor duplicate index

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit bab4072
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Mon May 31 17:58:40 2021 +0800

    fix fmt

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit 2dfebb0
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Mon May 31 17:30:40 2021 +0800

    use io.EOF to judge end

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit b9b6a31
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Thu May 27 17:08:25 2021 +0800

    add some node

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit 34b2150
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Tue May 25 20:37:12 2021 +0800

    duplicate data before checksum

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit 4341adc
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Tue May 25 17:44:38 2021 +0800

    support decode kv

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit 18edf13
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Mon May 24 23:58:33 2021 +0800

    retry region error

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

commit bdd64be
Author: Little-Wallace <bupt2013211450@gmail.com>
Date:   Mon May 24 20:02:40 2021 +0800

    add duplicate manager

    Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
tests: add test for cross engine duplicate detection
@Little-Wallace Little-Wallace force-pushed the duplicate branch 3 times, most recently from 785b622 to dd6c16a Compare July 25, 2021 09:10
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
pkg/lightning/backend/backend.go Outdated Show resolved Hide resolved
pkg/lightning/backend/backend.go Outdated Show resolved Hide resolved
pattern = '(?i)^(?:[^/]*/)*([^/.]+)\.(.*?)\.0\.sql$'
schema = '$1'
table = '$2'
key = '0'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional)

perhaps easier just use two distinct data-source-dir with no-schema = true.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand...

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Comment on lines 34 to 37

"github.com/pingcap/parser/mysql"

"github.com/cockroachdb/pebble"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"github.com/pingcap/parser/mysql"
"github.com/cockroachdb/pebble"
"github.com/pingcap/parser/mysql"
"github.com/cockroachdb/pebble"

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Little-Wallace still not fixed 🤔

return errors.Annotate(err, "collect local duplicate keys failed")
}
if err = duplicateManager.CollectDuplicateRowsFromTiKV(ctx, tbl); err != nil {
return errors.Annotate(err, "duplicate table failed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return errors.Annotate(err, "duplicate table failed")
return errors.Annotate(err, "collect remote duplicate keys failed")

pkg/lightning/backend/local/local.go Show resolved Hide resolved
pkg/lightning/backend/noop/noop.go Outdated Show resolved Hide resolved

"golang.org/x/sync/errgroup"

split "github.com/pingcap/br/pkg/restore"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should go to the import group with prefix github.com/pingcap/br/*

and why rename it to split 🤣

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just copy it from another place.....

Comment on lines 47 to 51
"github.com/cockroachdb/pebble"
"github.com/pingcap/errors"
sst "github.com/pingcap/kvproto/pkg/import_sstpb"
kvrpc "github.com/pingcap/kvproto/pkg/kvrpcpb"
tikv "github.com/pingcap/kvproto/pkg/tikvpb"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should join all other third-party imports (the group shared by zap, grpc, tidb, etc.)

pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
pkg/lightning/backend/local/duplicate.go Show resolved Hide resolved
pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no other issues from me.

pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@Little-Wallace
Copy link
Collaborator Author

/test

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
Copy link
Collaborator

@kennytm kennytm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
pkg/lightning/backend/local/duplicate.go Outdated Show resolved Hide resolved
@ti-chi-bot ti-chi-bot added the status/LGT1 LGTM1 label Aug 3, 2021
Signed-off-by: Little-Wallace <bupt2013211450@gmail.com>
@kennytm
Copy link
Collaborator

kennytm commented Aug 3, 2021

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: fb64eaa

@ti-chi-bot ti-chi-bot merged commit 0c84977 into pingcap:master Aug 3, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants