Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DM-worker may race or panic for invoking query-status and some dmctl command simultaneously #4811

Closed
lance6716 opened this issue Mar 8, 2022 · 7 comments · Fixed by #5881
Assignees
Labels
affects-5.3 affects-5.4 affects-6.1 area/dm Issues or PRs related to DM. severity/moderate type/bug The issue is confirmed as a bug.

Comments

@lance6716
Copy link
Contributor

lance6716 commented Mar 8, 2022

What did you do?

https://ci2.pingcap.net/blue/organizations/jenkins/dm_ghpr_integration_test/detail/dm_ghpr_integration_test/4100/pipeline

[2022-03-08T09:00:20.317Z] WARNING: DATA RACE

[2022-03-08T09:00:20.317Z] Write at 0x00c006394380 by goroutine 199:

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SubTask).markResultCanceled()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/subtask.go:470 +0x257

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SubTask).Pause()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/subtask.go:498 +0x54

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).OperateSubTask()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:564 +0x52c

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).operateSubTaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:743 +0x1fe

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).operateSubTaskStageWithoutConfig()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:762 +0x1b7

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).handleSubTaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:698 +0x658

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).observeSubtaskStage()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:656 +0x844

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).EnableHandleSubtasks.func1()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:438 +0x138

[2022-03-08T09:00:20.317Z] 

[2022-03-08T09:00:20.317Z] Previous read at 0x00c006394380 by goroutine 289:

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*ProcessResult).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:5308 +0x7a

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*SubTaskStatus).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:5008 +0x34d

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Size()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:4765 +0x1f7

[2022-03-08T09:00:20.317Z]   github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Marshal()

[2022-03-08T09:00:20.317Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3027 +0x54

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc/encoding/proto.codec.Marshal()

[2022-03-08T09:00:20.317Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/encoding/proto/proto.go:70 +0x23b

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc/encoding/proto.(*codec).Marshal()

[2022-03-08T09:00:20.317Z]       <autogenerated>:1 +0x64

[2022-03-08T09:00:20.317Z]   google.golang.org/grpc.encode()

[2022-03-08T09:00:20.317Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go:545 +0x74

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).sendResponse()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:869 +0x184

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).processUnaryRPC()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1117 +0xb34

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).handleStream()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405 +0x138b

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams.func1.1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xe6

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] Goroutine 199 (running) created at:

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*SourceWorker).EnableHandleSubtasks()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/source_worker.go:434 +0xbcc

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).enableHandleSubtasks()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:673 +0x195

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).operateSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:659 +0x2e8

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).handleSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:583 +0x511

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).observeSourceBound()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:407 +0x108b

[2022-03-08T09:00:20.318Z]   github.com/pingcap/tiflow/dm/dm/worker.(*Server).Start.func4()

[2022-03-08T09:00:20.318Z]       /home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/worker/server.go:170 +0x114

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] Goroutine 289 (running) created at:

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams.func1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xb8

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc/internal/transport.(*http2Server).operateHeaders()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:442 +0x1850

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc/internal/transport.(*http2Server).HandleStreams()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/internal/transport/http2_server.go:483 +0x49c

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).serveStreams()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:742 +0x1c7

[2022-03-08T09:00:20.318Z]   google.golang.org/grpc.(*Server).handleRawConn.func1()

[2022-03-08T09:00:20.318Z]       /go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:703 +0x4c

[2022-03-08T09:00:20.318Z] ==================

[2022-03-08T09:00:20.318Z] panic: runtime error: index out of range [-1]

[2022-03-08T09:00:20.318Z] 

[2022-03-08T09:00:20.318Z] goroutine 1179 [running]:

[2022-03-08T09:00:20.318Z] github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).MarshalToSizedBuffer(0xc0059ddb40, 0xc00647c000, 0x272, 0x272, 0x150129f, 0xc000d1e000, 0xc000df5648)

[2022-03-08T09:00:20.318Z] 	/home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3082 +0x952

[2022-03-08T09:00:20.318Z] github.com/pingcap/tiflow/dm/dm/pb.(*QueryStatusResponse).Marshal(0xc0059ddb40, 0x61b5f40, 0xc0059ddb40, 0x7fb4e4351e08, 0xc0059ddb40, 0x1)

[2022-03-08T09:00:20.318Z] 	/home/jenkins/agent/workspace/dm_ghpr_integration_test/go/src/github.com/pingcap/tiflow/dm/dm/pb/dmworker.pb.go:3029 +0xa5

[2022-03-08T09:00:20.318Z] google.golang.org/grpc/encoding/proto.codec.Marshal(0x61b5f40, 0xc0059ddb40, 0x1, 0xc00006d040, 0xc0059ddb40, 0xc004fd33d0, 0xc0059ddb60)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/encoding/proto/proto.go:70 +0x23c

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.encode(0x7fb4e45c3af8, 0xa295638, 0x61b5f40, 0xc0059ddb40, 0xa295638, 0x7, 0xf, 0x0, 0x6338d08)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/rpc_util.go:545 +0x75

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).sendResponse(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0x61b5f40, 0xc0059ddb40, 0x0, 0x0, 0xc006bf034f, 0x0, ...)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:869 +0x185

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).processUnaryRPC(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0xc000f68a50, 0x90b0840, 0x0, 0x0, 0x0)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1117 +0xb35

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).handleStream(0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100, 0x0)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:1405 +0x138c

[2022-03-08T09:00:20.318Z] google.golang.org/grpc.(*Server).serveStreams.func1.1(0xc00022b240, 0xc000924000, 0x74bcbb8, 0xc000b88000, 0xc006478100)

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:746 +0xe7

[2022-03-08T09:00:20.318Z] created by google.golang.org/grpc.(*Server).serveStreams.func1

[2022-03-08T09:00:20.318Z] 	/go/pkg/mod/google.golang.org/grpc@v1.29.1/server.go:744 +0xb9

What did you expect to see?

no error

What did you see instead?

No response

Versions of the cluster

all version

current status of DM cluster (execute query-status <task-name> in dmctl)

(paste current status of DM cluster here)
@lance6716 lance6716 added type/bug The issue is confirmed as a bug. area/dm Issues or PRs related to DM. labels Mar 8, 2022
@lance6716
Copy link
Contributor Author

Root cause

we expose the pointer to caller without the lock

func (st *SubTask) Result() *pb.ProcessResult {
st.RLock()
defer st.RUnlock()
return st.result
}

should return a copy

@niubell niubell removed their assignment Mar 10, 2022
@niubell
Copy link
Contributor

niubell commented Mar 10, 2022

/assign docsir

@ti-chi-bot
Copy link
Member

@niubell: GitHub didn't allow me to assign the following users: docsir.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign docsir

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@niubell niubell self-assigned this Mar 10, 2022
@niubell
Copy link
Contributor

niubell commented May 23, 2022

/unassign niubell

@niubell
Copy link
Contributor

niubell commented May 23, 2022

/assign lyzx2001

@ti-chi-bot
Copy link
Member

@niubell: GitHub didn't allow me to assign the following users: lyzx2001.

Note that only pingcap members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign lyzx2001

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@lance6716
Copy link
Contributor Author

/assign lyzx2001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-5.3 affects-5.4 affects-6.1 area/dm Issues or PRs related to DM. severity/moderate type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants