Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove or guards some unwraps in stratum #2453

Closed

Conversation

ignopeverell
Copy link
Contributor

Attempts to fix #2421. Stratum will ignore several client requests until re-login but should not crash anymore.

@rlinxy
Copy link

rlinxy commented Jan 23, 2019

I just complied and try this update, seems many rigs got dropped, only several were able to submit shares. The log repeats like below:
20190123 14:43:42.106 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 49.82.237.77:37716 20190123 14:43:42.108 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.108 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 495 20190123 14:43:42.110 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.118 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.119 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 492 20190123 14:43:42.119 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 110.86.104.123:17278 20190123 14:43:42.119 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 171.105.181.29:56180 20190123 14:43:42.120 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 202.114.49.71:56098 20190123 14:43:42.120 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 114.101.211.253:32965 20190123 14:43:42.121 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.121 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.122 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.122 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.124 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.124 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 493 20190123 14:43:42.124 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.124 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 494 20190123 14:43:42.132 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 112.3.242.62:11007 20190123 14:43:42.135 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 49.82.237.77:37719 20190123 14:43:42.135 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.137 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.137 DEBUG grin_servers::mining::stratumserver - (Server ID: 0) sending block 10674 with id 0 to single worker 20190123 14:43:42.137 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 498 20190123 14:43:42.141 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.141 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 499 20190123 14:43:42.148 WARN grin_servers::mining::stratumserver - (Server ID: 0) New connection: 183.184.114.85:55938 20190123 14:43:42.150 WARN grin_servers::mining::stratumserver - (Server ID: 0) Failed to parse JSONRpc: JSON error - [] 20190123 14:43:42.150 WARN grin_servers::mining::stratumserver - (Server ID: 0) Dropping worker: 501

self.handle_login(request.params, &mut workers_l[num])
}
"submit" => {
if let None = worker_stats_id {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't match (here and below) looks cleaner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hashmap 👍 looks like it. I suspect cargo-clippy also agrees.

.position(|r| r.id == workers_l[num].id)
.unwrap();
stratum_stats.worker_stats[worker_stats_id].last_seen = SystemTime::now();
.position(|r| r.id == workers_l[num].id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just ignore such worker? I'm trying to understand how we can get into this situation, don't see it so far.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial guessing has been that there is some miner (software, or class of user) that is fairly uncommon, and that manages to crash the stratum server silently. Truly crashing would allow pool operators to have the service autorestart. Or not crashing might hide the issue. So, maybe log this case loudly?

@sesam
Copy link
Contributor

sesam commented Jan 23, 2019

Related #2446 (awkward PR title, sorry) which also refactors a bit. I haven't yet compared the two PRs.

@bladedoyle and others who'd want to try this, do:
git fetch https://github.com/mimblewimble/grin pull/2453/head:stratum_panic_fix; git checkout stratum_panic_fix # after finishing testing, remember to git checkout master

@sesam
Copy link
Contributor

sesam commented Jan 23, 2019

UPDATE: having compared, this PR is cleaner, while potentially excluding some clients from accessing api endpoints keepalive and getjobtemplate - good or bad - is also what hashmap asked about.

The difference with #2446 is with where we ignore a worker (via continue) based on not finding worker_stats_id among stratum worker stats. When would that happen? Maybe if the stratum server has forgotten (intentionally or not) about one worker, or if the worker uses a bad ID.

(is the integer vs string discussion relevant here?)

@rlinxy
Copy link

rlinxy commented Jan 24, 2019

@sesam
The stratum was running for several hours smoothly without problem after I used the code updated by you. But hours later, got another panic

20190123 17:48:37.847 ERROR grin_util::logger -
thread 'stratum_server' panicked at 'called Option::unwrap() on a None value': src/libcore/option.rs:355stack backtrace:

I lost the log, but have a screen shot here
https://i.niupic.com/images/2019/01/24/5L1h.png

So it was the 'clean_workers' fonction caused this 'panic'.

I tried to modify the code as what you did in the clean_workers:
let worker_stats_id = match stratum_stats .worker_stats .iter() .position(|r| r.id == workers_l[num].id) { Some(id) => id, None => continue, };

No panic anymore, but the number of tcp connection is keep growing, obviously the 'dead' workers can not be dropped and the inactive connections always remain there.

@rlinxy
Copy link

rlinxy commented Jan 24, 2019

Please copy the link to the browser, I found just click doesn't work at github :(

@ignopeverell
Copy link
Contributor Author

Going to close this. It was an attempt at a small improvement but looks like our stratum server needs much more than small fixes.

@rlinxy
Copy link

rlinxy commented Jan 25, 2019

@ignopeverell Guessing it might be solved by combining @sesam and @hashmap's update, as what I post here at #2457. I am still running the test, but everything is fine till now, it has been running without problem for more than 1 hour with 100+ rigs mining.

@rlinxy
Copy link

rlinxy commented Jan 25, 2019

Has been running for 4 hours, the tcp connection grew from 300 to 1.5K, and keep increasing. So seems the 'dead' tcp connection still can not be cloesed properly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Stratum server crashed on miner reconnect
4 participants