-
Notifications
You must be signed in to change notification settings - Fork 1.7k
fix(light/response) : handle bad responses #9756
Conversation
.cycle() | ||
.skip(offset) | ||
.take(num_peers) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think cycle is clearer in this context and formatted the code to be easier to read
} | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verbosity, I think it helps to know which errors can occur
match err { | ||
// This can be a malformed request or bad response but we can't determine which at the moment! | ||
// Thus, register the response as bad and wait for more responses in order to take a decision | ||
ValidityError::BadProof | ValidityError::Empty => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can get a BadProof
for an execution on an account with an insufficient balance so better to be sure when we punish peers!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand this. Why would it happen now that we have #9591 (unless someone is running node, older than 2.0.7-stable/2.1.2-beta)? And how can we distinguish this from invalid proof?
ethcore/light/src/on_demand/tests.rs
Outdated
@@ -18,6 +18,7 @@ | |||
|
|||
use cache::Cache; | |||
use ethcore::header::Header; | |||
use ethcore::encoded::Header as EncodedHeader; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used remove!!
ethcore/light/src/on_demand/tests.rs
Outdated
@@ -28,7 +29,7 @@ use ::request::{self as basic_request, Response}; | |||
|
|||
use std::sync::Arc; | |||
|
|||
use super::{request, OnDemand, Peer, HeaderRef}; | |||
use super::{request, OnDemand, Peer, HeaderRef, ResponseError, ValidityError}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used remove!
Please label your PR :) |
61b9c76
to
a3ddcb0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have mixed feelings about this one:
- I thought Light client does not respond to eth_getTransactionReceipt request #9133 is about not replying to a valid request? For invalid ones we now may return "Timeout for On-demand query". Or is this about proper error message?
- Not sure about this formula
|bad_responses| > peers / 2
, since number of peers is dynamic and if we disconnect from peers this will change whether a request is considered bad or not. I don't have a better suggestion though. - This policy could potentially be implemented via failsafe (Refactor on-demand queries to use failsafe-rs #9536).
ethcore/light/src/on_demand/mod.rs
Outdated
{ | ||
-> Result<(), basic_request::ResponseError<self::request::Error>> { | ||
|
||
println!("{:?}", response); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use trace!
or debug!
here.
ethcore/light/src/on_demand/tests.rs
Outdated
let mut harness = Harness::create(); | ||
|
||
// peer[0] - requester ("light") | ||
// peer[1..9] - responders ("providers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unmatched (
fn time_out(self) { | ||
trace!(target: "on_demand", "Dropping a pending query (no new peer time out) at query #{}", self.query_id_history.len()); | ||
let err = self::error::ErrorKind::TimeoutOnNewPeers(self.requests.num_answered(), self.query_id_history.len()); | ||
if self.sender.send(Err(err.into())).is_err() { | ||
debug!(target: "on_demand", "Dropped oneshot channel receiver on time out"); | ||
} | ||
} | ||
|
||
// The given request is determined as faulty and drop it accordingly | ||
fn set_as_faulty_request(self, total_peers: usize, req_id: ReqId) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider renaming this to faulty_request
or bad_request
to follow the same naming convention as time_out
and no_response
(or rename those).
ethcore/light/src/on_demand/mod.rs
Outdated
// The given request is determined as faulty and drop it accordingly | ||
fn set_as_faulty_request(self, total_peers: usize, req_id: ReqId) { | ||
let bad_peers = self.bad_responses.len(); | ||
warn!(target: "on_demand", "The request: {} was determined as faulty, {}/{} peer(s) gave bad response", req_id, bad_peers, total_peers); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need a warn!
here? A user can see a similar message in the response.
This PR adds the following: * Registers empty responses as a `bad` * Register bad responses and only takes a decision when the majority of the peers have come to `agreement`.
a3ddcb0
to
56779ff
Compare
Great, comments
|
Closing for now, I will migrate to |
This PR adds the following:
bad
This change the assumptions slightly i.e, before we assumed that all requests are valid and only dropped on a timeout or too many request attempts to each peer.
Might be overlapping a bit with the timeouts and I plan the clean it up that code a bit if/after this gets merged! Should be sufficient with this and timeouts only
Issues:
bad_responses > peers/2
sufficient?/cc @Tbaut @amaurymartiny
Would be good if you could test it, seems to work well when I tested it with
https://gist.github.com/niklasad1/43690f39a07e59f92fd4a8ea9709385b