-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Deal Making Issue] Retrieval blocks forever in ResponderPaused state #5901
Comments
Thanks for the detailed bug report @mgoelzer ❤️ This log line suggests that there was a connection error between client and provider:
We are doing some work at the moment to improve connectivity issues, basically the client should try to reconnect to the provider when there's a connectivity problem. This work has landed in release v1.2.3 of go-fil-markets for storage, and is in progress for retrieval. |
Getting stuck in ResponderPaused may also be a symptom of this underlying issue: filecoin-project/go-data-transfer#184 |
@mgoelzer points out this is reproducible so it's unlikely to be caused by intermittent connection issues |
@mgoelzer One possibility is that your client is getting stuck trying to create a payment channel. Could you check for stuck messages in your local mpool:
Could you also run the following to increase the logging on your client:
Try both of these, depending on your version one of them should work:
|
@mgoelzer I was able to retrieve the deal successfully from my client. I'm running the |
Ok, some new testing results onthis. First tried building the tip of Then tried building
Doing
Next step: blow away my Unless there is a way to kill a phantom transfer while Lotus thinks it's in progress? |
|
@mgoelzer does |
@dirkmc Yes, your But now I can't cancel the transfers anymore either! Before they had incremental integer ids like 1, 2, 3, 4, etc. Now it is a huge number and I get I'm using the tip of master (
Do you think id values like |
Tested with the
I tried a |
@dirkmc I think we should consider closing this issue. If you were able to successfully retrieve |
I did open a bug for that bigint id thing in 1.7.0: #5938 |
@mgoelzer minerx/staging has a migration that isn't in v1.5.3, so switching between them is probably going to mess up the state. I am going to go ahead and close this ticket. The transfer ID format changed in the last release - instead of using a number that is stored in state and increments, we're now using a number based on the current time. This is to help avoid problems when people remove all their state and try to make deals with the same provider. Details here: filecoin-project/go-data-transfer#169 In the next markets release there will be a similar change for the deal ID. This is not in any lotus branch yet, so it's safest not to wipe state at the moment. |
Basic Information
Here I describe a reproducible retrieval failure in which a previously stored CID (verified, fast retrieval, 32 GiB) gets "stuck" during retrieval in a
ResponderPaused
state.The indefinite hang appears on the client, but I've also included logs from the miner to help debug.
Describe the problem
Here's the info needed to reproduce the problem:
f01240
bafykbzacea5dewvdatvbxc2tmi26bomowduqhoi7ery4yqi3n6li32n4oe546
Here's the problem as I observe it. When I try to retrieve this CID from a full node on another machine, the retrieval hangs forever at this point:
Running
lotus client list-transfers
gives this output during the hang:Version
Client:
lotus version 1.5.3-rc2+mainnet+git.9afb5ff94
Miner: also 1.5.3, but built from master so the version string is wrong. The build has all the merged PRs in 1.5.3.
Setup
Miner hardware unknown.
To Reproduce
Repro steps are above. This probably should be reproducible from any full node client.
Deal status
Lotus daemon and miner logs
Initially, right after the
lotus client retrieve
command was issued, we saw this in the logs:Here is the full log spanning the entire time period in question.
lotus-miner.log.zip
Code modifications
No source code modifications.
The text was updated successfully, but these errors were encountered: