Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syncer(dm): fix the data race issue #5881

Merged
merged 14 commits into from
Jun 21, 2022
8 changes: 7 additions & 1 deletion dm/dm/worker/subtask.go
Original file line number Diff line number Diff line change
Expand Up @@ -552,7 +552,13 @@ func (st *SubTask) markResultCanceled() bool {
func (st *SubTask) Result() *pb.ProcessResult {
st.RLock()
defer st.RUnlock()
return st.result
if st.result == nil {
return nil
}
tempProcessResult, _ := st.result.Marshal()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd better check the error just for the sake of security.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I have modified the code and added the error handling.

newProcessResult := &pb.ProcessResult{}
_ = newProcessResult.Unmarshal(tempProcessResult)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto. Or:

Suggested change
_ = newProcessResult.Unmarshal(tempProcessResult)
newProcessResult.Unmarshal(tempProcessResult)

Is it necessary to decode and encode it again? What about deep copying the object?

*a = *b // a, b are pointers

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it is necessary to decode and encode it, because Result() originally returns a pointer without the lock, which may cause data race when multiple functions are executing concurrently and calling Result(). In this case, deep copying would still return a pointer without the lock, which does not solve the problem. Here decoding and encoding again return a new copy of the process result, so that when multiple functions are calling Result() concurrently and using the returned pointer to modify things, they would be modifying the new copy, which has no effect on the original pointer, thus avoiding the potential data race.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified using deep copy.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_ = newProcessResult.Unmarshal(tempProcessResult)
Here I cannot delete _ =, because 'make check' would fail and report error return value is not checked. Since it is guaranteed here that Unmarshal() will not return an error, so we can just ignore it.

return newProcessResult
}

// Close stops the sub task.
Expand Down
32 changes: 32 additions & 0 deletions dm/dm/worker/subtask_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -572,3 +572,35 @@ func TestValidatorStatus(t *testing.T) {
require.True(t, terror.ErrValidatorNotFound.Equal(err))
// validator != nil: will be tested in IT
}

func TestSubtaskRace(t *testing.T) {
// to test data race of Marshal() and markResultCanceled()
tempErrors := []*pb.ProcessError{}
tempDetail := []byte{}
tempProcessResult := pb.ProcessResult{
IsCanceled: false,
Errors: tempErrors,
Detail: tempDetail,
}
cfg := &config.SubTaskConfig{
Name: "test-subtask-race",
ValidatorCfg: config.ValidatorConfig{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this issue do anything with the validator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure. Sorry I am not quite familiar with the validator.

Mode: config.ValidationFast,
},
}
st := NewSubTaskWithStage(cfg, pb.Stage_Paused, nil, "worker")
st.result = &tempProcessResult
tempQueryStatusResponse := pb.QueryStatusResponse{}
tempQueryStatusResponse.SubTaskStatus = make([]*pb.SubTaskStatus, 1)
tempSubTaskStatus := pb.SubTaskStatus{}
tempSubTaskStatus.Result = st.Result()
tempQueryStatusResponse.SubTaskStatus[0] = &tempSubTaskStatus
st.result.IsCanceled = false
go func() {
for i := 0; i < 10; i++ {
_, _ = tempQueryStatusResponse.Marshal()
}
}()
st.markResultCanceled()
// this test is to test data race, so don't need assert here
}