You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It seems that under heavy load, a Graphsync response to a request for data will sometimes get stuck in a paused state.
When I run the new lotus concurrent deals test TestDealCyclesConcurrent in the itest kit with concurrency of 32, the test consistently hangs.
It appears that all the retrieval deals complete except for one or two, and the logs for these incomplete deals indicate that the problem is in Graphsync's pausing mechanism.
The branches I used to produce this state are:
go-fil-markets dirkmc/log-validate-pull
Adds a log line that connects the deal ID to the channel ID so we can match logs at the go-fil-markets level to the go-data-transfer level
go-data-transfer dirkmc/log-outgoing-block
Adds some logging in the Graphsync transport so we can see if the sent / outgoing block hooks are called
I run the concurrent deals test in lotus:
go test ./itests -run TestDealCyclesConcurrent -v
I grep the output for each deal ID until I find the deal that hangs. Then I look for the log line that connects the deal ID to the data transfer channel ID:
2021-06-21T15:03:29.259+0200 INFO markets-rtvl-reval requestvalidation/requestvalidation.go:78 validate pull for deal 1624280585293275020 for data transfer channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
I use that to grep the logs for both the deal ID and the channel ID so as to get all relevant logs:
grep "1624280585293275020\|1624280585292810052" /tmp/out.txt
2021-06-21T15:03:29.128+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventOpen", "deal ID": "1624280585293275020", "state": "DealStatusNew", "message": ""}
2021-06-21T15:03:29.259+0200 INFO dt-impl impl/events.go:19 channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: opened
?V)????u,} to channel ID 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.259+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventDealProposed", "deal ID": "1624280585293275020", "state": "DealStatusWaitForAcceptance", "message": ""}
2021-06-21T15:03:29.259+0200 INFO dt_graphsync graphsync/graphsync.go:476 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052, pull request queued, req=&{BCid:bafk2bzacedr7tnpc62gvo2wcaxswfab4yduzmkv44kodcxbnpxrhaqhdbidk6 Type:0 Paus:false Part:false Pull:true Stor:0xc01035b758 Vouch:0xc01035b788 VTyp:RetrievalDealProposal/1 XferID:1624280585292810052 RestartChannel:--0}
2021-06-21T15:03:29.259+0200 DEBUG dt_graphsync graphsync/graphsync.go:520 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received request for data (pull)
2021-06-21T15:03:29.259+0200 INFO dt-impl impl/events.go:300 channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received new channel request from 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h
2021-06-21T15:03:29.259+0200 INFO markets-rtvl-reval requestvalidation/requestvalidation.go:78 validate pull for deal 1624280585293275020 for data transfer channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.390+0200 INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventOpen", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusNew", "message": ""}
2021-06-21T15:03:29.668+0200 INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventDealAccepted", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusUnsealing", "message": ""}
2021-06-21T15:03:29.736+0200 DEBUG dt_graphsync graphsync/graphsync.go:570 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: pausing graphsync response
2021-06-21T15:03:29.736+0200 DEBUG dt_graphsync graphsync/graphsync.go:969 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received request for data
?V)????u,} to channel ID 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.741+0200 INFO markets loggers/loggers.go:30 retrieval provider event {"name": "ProviderEventUnsealComplete", "deal ID": "1624280585293275020", "receiver": "12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h", "state": "DealStatusUnsealed", "message": ""}
2021-06-21T15:03:29.805+0200 INFO dt-impl impl/impl.go:391 resume channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052
2021-06-21T15:03:29.805+0200 DEBUG dt_graphsync graphsync/graphsync.go:1055 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: unpausing response
2021-06-21T15:03:29.805+0200 WARN dt-impl impl/impl.go:400 Error attempting to resume 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052 at transport level: request is not paused
2021-06-21T15:03:29.998+0200 INFO dt-impl impl/events.go:188 channel 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: received new response, accepting channel
2021-06-21T15:03:29.999+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventDealAccepted", "deal ID": "1624280585293275020", "state": "DealStatusAccepted", "message": ""}
2021-06-21T15:03:31.318+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventPaymentChannelAddingFunds", "deal ID": "1624280585293275020", "state": "DealStatusPaymentChannelAddingInitialFunds", "message": ""}
2021-06-21T15:03:32.511+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventPaymentChannelReady", "deal ID": "1624280585293275020", "state": "DealStatusPaymentChannelAllocatingLane", "message": ""}
2021-06-21T15:03:32.725+0200 INFO markets loggers/loggers.go:25 retrieval client event {"name": "ClientEventLaneAllocated", "deal ID": "1624280585293275020", "state": "DealStatusOngoing", "message": ""}
Normally after ProviderEventUnsealComplete there would be a ProviderEventBlockSent event.
2021-06-21T15:03:29.805+0200 DEBUG dt_graphsync graphsync/graphsync.go:1055 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052: unpausing response
2021-06-21T15:03:29.805+0200 WARN dt-impl impl/impl.go:400 Error attempting to resume 12D3KooWCGuvzoKipy5Y6DRGeecSH57R2hzS7xtrhFELef7CAY9h-12D3KooWLqudbUu45uQcJCh1eMk7GeH1ZesUGyUqqcpGLapSXxGs-1624280585292810052 at transport level: request is not paused
Therefore the Graphsync transfer remains in the paused state, the outgoing block hook is never called (see the log lines that would be output if it were called) and ProviderEventBlockSent is never fired.
The text was updated successfully, but these errors were encountered:
dirkmc
changed the title
Response paused then unpaused but never calls outgoing block hook
Response unpaused but never calls outgoing block hook
Jun 21, 2021
It seems that under heavy load, a Graphsync response to a request for data will sometimes get stuck in a paused state.
When I run the new lotus concurrent deals test
TestDealCyclesConcurrent
in the itest kit with concurrency of 32, the test consistently hangs.It appears that all the retrieval deals complete except for one or two, and the logs for these incomplete deals indicate that the problem is in Graphsync's pausing mechanism.
The branches I used to produce this state are:
Includes the new
TestDealCyclesConcurrent
testAdds a log line that connects the deal ID to the channel ID so we can match logs at the go-fil-markets level to the go-data-transfer level
Adds some logging in the Graphsync transport so we can see if the sent / outgoing block hooks are called
I run the concurrent deals test in lotus:
I grep the output for each deal ID until I find the deal that hangs. Then I look for the log line that connects the deal ID to the data transfer channel ID:
I use that to grep the logs for both the deal ID and the channel ID so as to get all relevant logs:
Normally after
ProviderEventUnsealComplete
there would be aProviderEventBlockSent
event.This log line indicates that hookActions.PauseResponse() is called when a new request for data arrives:
These log lines indicate that Graphsync.UnpauseResponse is called but that Graphsync does not believe the response is paused:
Therefore the Graphsync transfer remains in the paused state, the outgoing block hook is never called (see the log lines that would be output if it were called) and
ProviderEventBlockSent
is never fired.The text was updated successfully, but these errors were encountered: