Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deals stuck in "StorageDealsTransferring" with 0B transferred #5268

Closed
rjan90 opened this issue Dec 29, 2020 · 2 comments
Closed

Deals stuck in "StorageDealsTransferring" with 0B transferred #5268

rjan90 opened this issue Dec 29, 2020 · 2 comments
Assignees
Labels
area/markets Area: Markets kind/bug Kind: Bug

Comments

@rjan90
Copy link
Contributor

rjan90 commented Dec 29, 2020

Describe the bug
A couple of miners experienced that deals with the same client suddenly started to stop in "StorageDealsTransferring" with 0 bytes transferred, after successfully sealing deals from the client over many days. This issue is based on all the feedback and logs from both miners and the client in the Slack-thread https://filecoinproject.slack.com/archives/C01AZP8BKRQ/p1609164686470600

On the miner side

lotus-miner storage-deals list -v
Dec 28 14:47:07  bafyreigpl5vm64hhowjc2g5e7reppu6hngme5ff4lkaa3ykd6h2r5zveza  0        StorageDealTransferring             f3sar6tat2mhqadlqjjxewmpqya3whpkzuqaevpo5vwfvdhnm3uztd7p67d6ozpsfkiwvrvqrhp3zu6psi23ma  16GiB  0 FIL  1052529  12D3KooWSDhUpT9YMLxRTEAf3maSR34M64yYm4XBHKeYhwtbaUo7-12D3KooWMpuRzgVuRS6m285cucGgnXWwTnsNtGzjsMJeEqS7UtJw-426   

Data-Transfers shows that the transfer is ongoing, but with 0 bytes transferred:

lotus-miner data-transfers list
ID   Status   Receiving From  Root Cid     Initiated?  Transferred  Voucher                                   
426  Ongoing  ...hwtbaUo7     ...peoy6mfa  N           0B           ...7reppu6hngme5ff4lkaa3ykd6h2r5zveza"}}  

My miner has a connection with the client:

lotus-miner net connect /ip4/X.--..--.XX/tcp/36359/p2p/12D3KooWSDhUpT9YMLxRTEAf3maSR34M64yYm4XBHKeYhwtbaUo7
connect 12D3KooWSDhUpT9YMLxRTEAf3maSR34M64yYm4XBHKeYhwtbaUo7: success

If I try to restart the channel I can see in the logs that the channel has timed out:

2020-12-28T19:12:29.049+0100	INFO	markets	loggers/loggers.go:20	storage event	{"name": "ProviderEventDataTransferRestarted", "proposal CID": "bafyreigpl5vm64hhowjc2g5e7reppu6hngme5ff4lkaa3ykd6h2r5zveza", "state": "StorageDealTransferring", "message": ""}
2020-12-28T19:12:29.053+0100	WARN	dt_graphsync	graphsync/graphsync.go:150	graphsync request context cancelled, channel Id: 12D3KooWSDhUpT9YMLxRTEAf3maSR34M64yYm4XBHKeYhwtbaUo7-12D3KooWMpuRzgVuRS6m285cucGgnXWwTnsNtGzjsMJeEqS7UtJw-426
2020-12-28T19:12:29.053+0100	WARN	dt-impl	impl/events.go:196	channel 12D3KooWSDhUpT9YMLxRTEAf3maSR34M64yYm4XBHKeYhwtbaUo7-12D3KooWMpuRzgVuRS6m285cucGgnXWwTnsNtGzjsMJeEqS7UtJw-426 has timed out

On the client-side

Client Deals
Dec 28 21:47:07 bafyreigpl5vm64hhowjc2g5e7reppu6hngme5ff4lkaa3ykd6h2r5zveza 0 f023467 StorageDealStartDataTransfer N N baga6ea4seaqg3tsrp56bchrhumenu4cjnl57cpxudfiggettmlejqggtwf74coa 15.88 GiB 0 FIL 1052529 true

Client Transfers
426 Requested ...EqS7UtJw ...peoy6mfa Y 0B ...7reppu6hngme5ff4lkaa3ykd6h2r5zveza"}}

The client checked his network bandwidth and didn't have any problems with wget or scp to another machine. But there are no clear lotus transfers in nethogs.

Client also checked his socket limit:

use ulimit -n
get 1024
Then is use  netstat |wc
There are about 850 connects. So it may be socket connections number limit.
Then I revised the limit to 64435 and restart servers. 
Using nethogs command, I find the send rate of lotus reach 10M/s. It seems ok.
Then I check the transfers, I find the max transfer id reach 655, but the previous max transfer id is 428. I didn't add any new deals in these times.

Possible problems might be:

  1. Socket Connection Limit
  2. Lotus Client

To Reproduce
Steps to reproduce the behavior:

  1. Client send deals to miners over a longer period of time
  2. Miners are able to seal the deals for a longer period of time, and then suddenly at almost the same time multiple miners experience that deals stop in "StorageDealsTransferring" state

Version (run lotus version):
Daemon: 1.4.0+git.e9989d0e4+api1.0.0
Local: lotus version 1.4.0+git.e9989d0e4

Additional context
Ongoing discussion is also happening in the slack-channel. And this issue also seems to be related to #4946 & #5211

@rjan90
Copy link
Contributor Author

rjan90 commented Jan 2, 2021

After experiencing another batch of deals stuck in "StorageDealsTransferring"-state with the same client, I think we have pin-pointed to at least one reason why this happens.

I restarted both my lotus daemon/miner due to some maintenance, and when I got the next storage-deal proposal after restarting my miner, the deal got stuck in "StorageDealsTransferring". To solve the problem the client also restarted his lotus-process, and then the transfers started.

So steps to reproduce:

  1. Client sends deal to a miner.Transfers work normally, and the miner seals the deal.
  2. Miner restarts the miner. Client send a new deal after miner is back online.
  3. The deal gets stuck in "StorageDealsTransferring"

@dirkmc
Copy link
Contributor

dirkmc commented Jan 4, 2021

Thank you for the detailed bug report. I believe this will be fixed by #5210

I want to write some comprehensive tests around restarts to make sure we've covered all cases before merging this fix so it may take a few more days.

@dirkmc dirkmc added area/markets Area: Markets kind/bug Kind: Bug labels Jan 5, 2021
@dirkmc dirkmc self-assigned this Jan 5, 2021
@dirkmc dirkmc added this to the 💹Storage Deal Success milestone Jan 5, 2021
@magik6k magik6k closed this as completed Jan 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/markets Area: Markets kind/bug Kind: Bug
Projects
None yet
Development

No branches or pull requests

3 participants