Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncer(dm): fix the data race issue #5881

Merged
merged 14 commits into from
Jun 21, 2022
Merged

Conversation

lyzx2001
Copy link
Contributor

What problem does this PR solve?

Issue Number: close #4811

What is changed and how it works?

The Result() function now returns a new copy of the process result instead of directly returning the pointer without the lock, which may cause data race when multiple functions are executing concurrently, some calling Result() and writing into some variables that other functions are trying to read.

To solve the problem, first call Marshal() on the original process result, which will generate an intermediate process result. Then call Unmarshal() on the intermediate one and generate a new process result. Now the new process result is a copy of the original one, thus it can be the return parameter of Result().

Check List

Tests

  • Unit test

Questions

Will it cause performance regression or break compatibility?

No. The type of the return parameter is not changed, which is still *pb.ProcessResult, so this update will not affect those functions that call Result().

Do you need to update user documentation, design documentation or monitoring documentation?

No.

Release note

Fix the issue of the possible data race that might occur when multiple functions are executing concurrently, some calling Result() and writing into some variables that other functions are trying to read. #4811 

@ti-chi-bot
Copy link
Member

ti-chi-bot commented Jun 15, 2022

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • D3Hunter
  • lance6716

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot ti-chi-bot added needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. labels Jun 15, 2022
@sre-bot
Copy link

sre-bot commented Jun 15, 2022

CLA assistant check
All committers have signed the CLA.

@ti-chi-bot ti-chi-bot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 15, 2022
@lance6716 lance6716 added the area/dm Issues or PRs related to DM. label Jun 15, 2022
@ti-chi-bot ti-chi-bot added the needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. label Jun 15, 2022
Copy link
Contributor

@buchuitoudegou buchuitoudegou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we should check a potential data race issue in such way (manually writing and reading in different routines). Perhaps just checking if the returned result is a snapshot of the original one is sufficient🤔

_, _ = tempQueryStatusResponse.Marshal()
}
}()
_ = st.markResultCanceled()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_ = st.markResultCanceled()
st.markResultCanceled()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked by writing and reading in different routines because in issue #4811 the data race was found in the two functions markResultCanceled() and Marshal(). So here I manually reproduce the encountered data race and fix it.


[2022-03-08T09:00:20.317Z] Write at 0x00c006394380 by goroutine 199:

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SubTask).markResultCanceled()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/subtask.go:470 +0x257

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SubTask).Pause()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/subtask.go:498 +0x54

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).OperateSubTask()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:564 +0x52c

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).operateSubTaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:743 +0x1fe

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).operateSubTaskStageWithoutConfig()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:762 +0x1b7

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).handleSubTaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:698 +0x658

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).observeSubtaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:656 +0x844

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).EnableHandleSubtasks.func1()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:438 +0x138

[2022-03-08T09:00:20.317Z] 

[2022-03-08T09:00:20.317Z] Previous read at 0x00c006394380 by goroutine 289:

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*ProcessResult).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:5308 +0x7a

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*SubTaskStatus).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:5008 +0x34d

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:4765 +0x1f7

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Marshal()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3027 +0x54

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc/encoding/proto.codec.Marshal()

[2022-03-08T09:00:20.317Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/encoding/proto/proto.go:70 +0x23b

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc/encoding/proto.(*codec).Marshal()

[2022-03-08T09:00:20.317Z]       <autogenerated>:1 +0x64

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc.encode()

[2022-03-08T09:00:20.317Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go:545 +0x74

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).sendResponse()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:869 +0x184

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).processUnaryRPC()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1117 +0xb34

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).handleStream()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405 +0x138b

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams.func1.1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xe6

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] Goroutine 199 (running) created at:

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).EnableHandleSubtasks()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:434 +0xbcc

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).enableHandleSubtasks()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:673 +0x195

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).operateSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:659 +0x2e8

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).handleSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:583 +0x511

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).observeSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:407 +0x108b

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).Start.func4()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:170 +0x114

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] Goroutine 289 (running) created at:

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams.func1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xb8

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:442 +0x1850

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:483 +0x49c

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:742 +0x1c7

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).handleRawConn.func1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:703 +0x4c

[2022-03-08T09:00:20.318Z] ==================

[2022-03-08T09:00:20.318Z] panic: runtime error: index out of range [-1]

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] goroutine 1179 [running]:

[2022-03-08T09:00:20.318Z] github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).MarshalToSizedBuffer(0xc0059ddb40, 0xc00647c000, 0x272, 0x272, 0x150129f, 0xc000d1e000, 0xc000df5648)

[2022-03-08T09:00:20.318Z] 	/home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3082 +0x952

[2022-03-08T09:00:20.318Z] github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Marshal(0xc0059ddb40, 0x61b5f40, 0xc0059ddb40, 0x7fb4e4351e08, 0xc0059ddb40, 0x1)

[2022-03-08T09:00:20.318Z] 	/home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3029 +0xa5

[2022-03-08T09:00:20.318Z] google.golang.org/grpc/encoding/proto.codec.Marshal(0x61b5f40, 0xc0059ddb40, 0x1, 0xc00006d040, 0xc0059ddb40, 0xc004fd33d0, 0xc0059ddb60)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/encoding/proto/proto.go:70 +0x23c

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.encode(0x7fb4e45c3af8, 0xa295638, 0x61b5f40, 0xc0059ddb40, 0xa295638, 0x7, 0xf, 0x0, 0x6338d08)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go:545 +0x75

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).sendResponse(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0x61b5f40, 0xc0059ddb40, 0x0, 0x0, 0xc006bf034f, 0x0, ...)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:869 +0x185

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).processUnaryRPC(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0xc000f68a50, 0x90b0840, 0x0, 0x0, 0x0)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1117 +0xb35

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).handleStream(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0x0)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405 +0x138c

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc00022b240, 0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xe7

[2022-03-08T09:00:20.318Z] created by google.golang.org/grpc.(*Server).serveStreams.func1

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xb9

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified. The unit test changed to just checking if the returned result is a snapshot of the original one.

@@ -552,7 +552,10 @@ func (st *SubTask) markResultCanceled() bool {
func (st *SubTask) Result() *pb.ProcessResult {
st.RLock()
defer st.RUnlock()
return st.result
tempProcessResult, _ := st.result.Marshal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd better check the error just for the sake of security.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I have modified the code and added the error handling.

return st.result
tempProcessResult, _ := st.result.Marshal()
newProcessResult := &pb.ProcessResult{}
_ = newProcessResult.Unmarshal(tempProcessResult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Or:

Suggested change
_ = newProcessResult.Unmarshal(tempProcessResult)
newProcessResult.Unmarshal(tempProcessResult)

Is it necessary to decode and encode it again? What about deep copying the object?

*a = *b // a, b are pointers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is necessary to decode and encode it, because Result() originally returns a pointer without the lock, which may cause data race when multiple functions are executing concurrently and calling Result(). In this case, deep copying would still return a pointer without the lock, which does not solve the problem. Here decoding and encoding again return a new copy of the process result, so that when multiple functions are calling Result() concurrently and using the returned pointer to modify things, they would be modifying the new copy, which has no effect on the original pointer, thus avoiding the potential data race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified using deep copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ = newProcessResult.Unmarshal(tempProcessResult)
Here I cannot delete _ =, because 'make check' would fail and report error return value is not checked. Since it is guaranteed here that Unmarshal() will not return an error, so we can just ignore it.

st.result.IsCanceled = false
go func() {
for i := 0; i < 10; i++ {
_, _ = tempQueryStatusResponse.Marshal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loop is just to increase the chances of running into data race, because one or two tests may not be able to fully check the issue. We'd better wait until these routines stop before exiting the test.

}
cfg := &config.SubTaskConfig{
Name: "test-subtask-race",
ValidatorCfg: config.ValidatorConfig{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this issue do anything with the validator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Sorry I am not quite familiar with the validator.

tempQueryStatusResponse.SubTaskStatus[0] = &tempSubTaskStatus
for i := 0; i < 10; i++ {
st.result.IsCanceled = false
go func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we wait until these routines stop before exiting this test or they are just detached?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The for loop is just to increase the chances of running into data race, because one or two tests may not be able to fully check the issue. We'd better wait until these routines stop before exiting the test.

@lance6716 lance6716 changed the title fix the data race issue syncer(dm): fix the data race issue Jun 15, 2022
@ti-chi-bot ti-chi-bot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Jun 16, 2022
Comment on lines 555 to 564
tempProcessResult := st.result
Errors := []*pb.ProcessError{}
Detail := []byte{}
immProcessResult := pb.ProcessResult{
IsCanceled: false,
Errors: Errors,
Detail: Detail,
}
var newProcessResult *pb.ProcessResult
newProcessResult = &immProcessResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tempProcessResult := st.result
Errors := []*pb.ProcessError{}
Detail := []byte{}
immProcessResult := pb.ProcessResult{
IsCanceled: false,
Errors: Errors,
Detail: Detail,
}
var newProcessResult *pb.ProcessResult
newProcessResult = &immProcessResult
var newProcessResult *pb.ProcessResult
newProcessResult = &pb.ProcessResult{}

}
var newProcessResult *pb.ProcessResult
newProcessResult = &immProcessResult
*newProcessResult = *tempProcessResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pb.ProcessResult.Errors is an array of pointers, which should be deep copied as well.

Suggested change
*newProcessResult = *tempProcessResult
*newProcessResult = *st.Result

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified back to Marshal() and Unmarshal(), which is less efficient than deep copy but easier to maintain (have been discussed).

Comment on lines 594 to 598
var check bool
if st.Result() == st.result {
check = false
} else {
check = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
var check bool
if st.Result() == st.result {
check = false
} else {
check = true
require.NotEqual(t, st.Result(), st.result)

Only checking this is not sufficient (e.g. the pointers in the Errors slice).

README.md Outdated
@@ -1,9 +1,12 @@
# TiFlow

[![LICENSE](https://img.shields.io/github/license/pingcap/tiflow.svg)](https://github.com/pingcap/tiflow/blob/master/LICENSE)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see a lot of changes about readme and cdc, is that related to fixing this bug?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

She possibly rebase-merged the master🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. I rebase-merged the master into my branch, probably shouldn't have done so. Only the changes in (*SubTask).Result() in dm/dm/worker/subtask.go and TestSubtaskRace() in dm/dm/worker/subtask_test.go are related to fixing this bug. I have fixed this problem, and now this PR should only contain the changes above.

@ti-chi-bot ti-chi-bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Jun 17, 2022
@codecov-commenter
Copy link

codecov-commenter commented Jun 20, 2022

Codecov Report

Merging #5881 (2e56fa2) into master (6a451ea) will increase coverage by 0.3948%.
The diff coverage is 59.6810%.

Flag Coverage Δ
cdc 62.4657% <44.2953%> (+0.4857%) ⬆️
dm 51.9826% <29.3103%> (+0.0524%) ⬆️
engine 63.9114% <77.1551%> (+1.7268%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

@@               Coverage Diff                @@
##             master      #5881        +/-   ##
================================================
+ Coverage   57.0764%   57.4713%   +0.3948%     
================================================
  Files           682        677         -5     
  Lines         80224      79765       -459     
================================================
+ Hits          45789      45842        +53     
+ Misses        30146      29679       -467     
+ Partials       4289       4244        -45     


func TestSubtaskRace(t *testing.T) {
// to test data race of Marshal() and markResultCanceled()
Errors := []*pb.ProcessError{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally local variables should not be public (capitalized)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified.

tempSubTaskStatus := pb.SubTaskStatus{}
tempSubTaskStatus.Result = st.Result()
tempQueryStatusResponse.SubTaskStatus[0] = &tempSubTaskStatus
for i := 0; i < 10; i++ {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try that we can remove this loop and race detector still work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried removing this loop and restoring the original Result(), and ran the unit test several times. It turned out that the race detector still worked. I have removed this loop.

@ti-chi-bot ti-chi-bot added the status/LGT1 Indicates that a PR has LGTM 1. label Jun 20, 2022
@lyzx2001
Copy link
Contributor Author

/run-verify

@ti-chi-bot ti-chi-bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 21, 2022
@D3Hunter
Copy link
Contributor

/merge

@ti-chi-bot
Copy link
Member

This pull request has been accepted and is ready to merge.

Commit hash: 2e56fa2

@ti-chi-bot ti-chi-bot added the status/can-merge Indicates a PR has been approved by a committer. label Jun 21, 2022
@lyzx2001
Copy link
Contributor Author

/run-dm-integration-test

@ti-chi-bot ti-chi-bot merged commit 86780b1 into pingcap:master Jun 21, 2022
ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 21, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5959.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 21, 2022
Signed-off-by: ti-chi-bot <ti-community-prow-bot@tidb.io>
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5960.

ti-chi-bot pushed a commit to ti-chi-bot/tiflow that referenced this pull request Jun 21, 2022
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created: #5961.

ti-chi-bot added a commit that referenced this pull request Jun 28, 2022
ti-chi-bot added a commit that referenced this pull request Jul 4, 2022
ti-chi-bot added a commit that referenced this pull request Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/dm Issues or PRs related to DM. needs-cherry-pick-release-5.3 Should cherry pick this PR to release-5.3 branch. needs-cherry-pick-release-5.4 Should cherry pick this PR to release-5.4 branch. needs-cherry-pick-release-6.1 Should cherry pick this PR to release-6.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. status/can-merge Indicates a PR has been approved by a committer. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DM-worker may race or panic for invoking query-status and some dmctl command simultaneously
7 participants