-
Notifications
You must be signed in to change notification settings - Fork 108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't trust reported last seen times #2178
Don't trust reported last seen times #2178
Conversation
fab6329
to
c33660e
Compare
19e2605
to
73b0df4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few significant changes:
- If any times are in the future, let's adjust all times, so we handle honest clock skewed peers correctly
- Let's test peers that are exactly at the limit
- Let's test using random limits, not just the current time when the test is running
I also have some general feedback about formatting and refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few comment tweaks, nothing major
f37fb4e
to
c1e3fe5
Compare
c1e3fe5
to
50ed84b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some tests are failing because I added leap seconds to the Arbitrary
methods in a recent PR: #2195
I've made some suggestions to fix that. (Zebra shouldn't crash if it's running during a leap second. But it's ok for the tests to maybe fail, because leap seconds are rare.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good, the core code is really solid.
We need to do a few other fixes due to existing MetaAddr bugs. And I'd like to simplify the error handling in this PR.
My Leap Seconds PR
This PR failed due to #2195, which adds arbitrary leap seconds. (Sorry about that, I didn't expect it to make any difference.) So I made a fix in jvff#1
Existing MetaAddr Bugs
I also added a bunch of other properties we should test in jvff#1, but we need to merge #2203 to fix some MetaAddr
serialization bugs first. Feel free to rebase and re-enable those tests.
Simplifying Error Handling in this PR
I'd like to re-design how we do error handling. We're handling errors in a few different places. So it would be easy to miss some, particularly as validation gets more complicated. I'd like to use idiomatic Rust error-handling, where each function returns a Result
.
So the functions in this PR would look like:
struct MetaAddrValidationError;
fn MetaAddr::offset_last_seen_by(&self, offset: Duration) -> Result<MetaAddr, MetaAddrValidationError>
fn limit_last_seen_times(addrs: &Vec<MetaAddr>, last_seen_limit: DateTime<Utc>) -> Result<Vec<MetaAddr>, MetaAddrValidationError>
fn validate_addrs(addrs: impl IntoIterator<Item = MetaAddr>, last_seen_limit: DateTime<Utc>) -> Result<Vec<MetaAddr>, MetaAddrValidationError>
That way, we can propagate errors up to update_fanout
using the try operator ?
, and handle them in one place. (By not sending the addresses to the address book.)
I think we'll end up with simpler code that way. We can use checked functions for all our calculations, and use ?
to return any errors.
(We might need to impl From<None>
or From<NoneError>
for MetaAddrValidationError
, let me know how you go.)
Let me know what you think - we can do some pair programming on these changes if you'd like.
Returning `impl IntoIterator` means that the caller will always be forced to call `.into_iter()`, and returning `impl Iterator` still allows them to call `.into_iter()` because it becomes the identity function.
Will be used when limiting the reported last seen times for recived gossiped addresses.
Due to clock skew, the peers could end up at the front of the reconnection queue or far at the back. The solution to this is to offset the reported times by the difference between the most recent reported sight (in the remote clock) and the current time (in the local clock).
- Make the security impact clearer and in a separate section. - Instead of listing an assumption as almost a side-note, describe it clearly inside a `Panics` section. Co-authored-by: teor <teor@riseup.net>
Times in the past don't have any security implications, so there's no point in trying to apply the offset to them as well.
If any of the times gossiped by a peer are in the future, apply the necessary offset to all the times gossiped by that peer. This ensures that all gossiped peers from a malicious peer are moved further back in the queue. Co-authored-by: teor <teor@riseup.net>
Make it clear why all peers have the time offset applied to them. Co-authored-by: teor <teor@riseup.net>
If an overflow occurs, the reported `last_seen` times are either very wrong or malicious, so reject all addresses gossiped by that peer.
Focus on what can go wrong, and not on the specific causes. Co-authored-by: teor <teor@riseup.net>
The `limit_last_seen_times` can now safely handle an empty list.
Use some mock gossiped peers that all have `last_seen` times in the future and check that they all have a specific offset applied to them.
Use some mock gossiped peers that all have `last_seen` times in the past and check that they don't have any changes to the `last_seen` times applied by the `validate_addrs` function.
Use some mock gossiped peers where some have `last_seen` times in the past and some have times in the future. Check that all the peers have an offset applied to them by the `validate_addrs` function. This tests if the offset is applied to all peers that a malicious peer gossiped to us.
If the most recent `last_seen` time reported by a peer is exactly the limit, the offset doesn't need to be applied because no times are in the future.
If the calculation to apply the compensation offset overflows or underflows, the reported times are too distant apart, and could be sent on purpose by a malicious peer, so all addresses from that peer should be rejected.
Given a generated list of gossiped peers, ensure that after running the `validate_addrs` function none of the resulting peers have a `last_seen` time that's after the specified limit.
b19043c
to
15a8ff0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great.
I'll just commit some typo fixes, and then squash the PR.
/// Set the last time we interacted with this peer. | ||
pub(crate) fn set_last_seen(&mut self, last_seen: DateTime32) { | ||
self.last_seen = last_seen; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I love simple code
// No overflow is possible, so apply offset to all addresses | ||
for addr in addrs { | ||
let old_last_seen = addr.get_last_seen().timestamp(); | ||
let new_last_seen = old_last_seen - offset; | ||
|
||
addr.set_last_seen(new_last_seen.into()); | ||
} | ||
} else { | ||
// An overflow will occur, so reject all gossiped peers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a minor typo fix:
// No overflow is possible, so apply offset to all addresses | |
for addr in addrs { | |
let old_last_seen = addr.get_last_seen().timestamp(); | |
let new_last_seen = old_last_seen - offset; | |
addr.set_last_seen(new_last_seen.into()); | |
} | |
} else { | |
// An overflow will occur, so reject all gossiped peers | |
// No underflow is possible, so apply offset to all addresses | |
for addr in addrs { | |
let old_last_seen = addr.get_last_seen().timestamp(); | |
let new_last_seen = old_last_seen - offset; | |
addr.set_last_seen(new_last_seen.into()); | |
} | |
} else { | |
// An underflow will occur, so reject all gossiped peers |
/// Ensure all reported `last_seen` times are less than or equal to `last_seen_limit`. | ||
/// | ||
/// This will consider all addresses as invalid if trying to offset their | ||
/// `last_seen` times to be before the limit causes an overflow. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// `last_seen` times to be before the limit causes an overflow. | |
/// `last_seen` times to be before the limit causes an underflow. |
// If any time is in the future, adjust all times, to compensate for clock skew on honest peers | ||
if newest_reported_seen_timestamp > last_seen_limit.timestamp() { | ||
let offset = newest_reported_seen_timestamp - last_seen_limit.timestamp(); | ||
|
||
// Apply offset to oldest timestamp to check for underflow | ||
let oldest_resulting_timestamp = oldest_reported_seen_timestamp as i64 - offset as i64; | ||
if oldest_resulting_timestamp >= 0 { | ||
// No overflow is possible, so apply offset to all addresses | ||
for addr in addrs { | ||
let old_last_seen = addr.get_last_seen().timestamp(); | ||
let new_last_seen = old_last_seen - offset; | ||
|
||
addr.set_last_seen(new_last_seen.into()); | ||
} | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this code is about as simple as it gets.
If we used chrono::DateTime
, we would need check for other errors. (Or justify that they can't possibly happen.)
Good call here!
fn no_last_seen_times_are_in_the_future( | ||
gossiped_peers in vec(MetaAddr::gossiped_strategy(), 1..10), | ||
last_seen_limit in any::<DateTime32>(), | ||
) { | ||
zebra_test::init(); | ||
|
||
let validated_peers = validate_addrs(gossiped_peers, last_seen_limit); | ||
|
||
for peer in validated_peers { | ||
prop_assert![peer.get_last_seen() <= last_seen_limit]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about as good as a proptest ever gets.
I'm glad we made the DateTime32
change.
let (oldest_reported_seen_timestamp, newest_reported_seen_timestamp) = | ||
addrs | ||
.iter() | ||
.fold((u32::MAX, u32::MIN), |(oldest, newest), addr| { | ||
let last_seen = addr.get_last_seen().timestamp(); | ||
(oldest.min(last_seen), newest.max(last_seen)) | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's an example of implementing min and max using iterators:
let timestamps: Vec<_> = addrs
.iter()
.map(|addr| addr.get_last_seen().timestamp())
.collect();
let oldest_reported_seen_timestamp = timestamps.iter().min().cloned().unwrap_or(u32::MAX);
let newest_reported_seen_timestamp = timestamps.iter().max().cloned().unwrap_or(u32::MIN);
In this case the code is about the same length, so let's stick with your version.
By the way, the fold
/unwrap_or
constants are never actually used. They participate in the calculations, but they never get applied to any data, because the array is empty. But I think they're better than using the Option
s in later code.
Ha, caught by the auto-merge. I'll just commit those comment changes to |
Motivation
Zebra would previously trust the
last_seen
time reported on a gossiped peer. However, if that time is in the future, it could take precedence over all other peers in the reconnection queue. This could happen intentionally by a malicious peer or unintentionally due to clock skew. The reporting issue (#1871) has more information.Solution
The proposed solution is to apply an offset to the reported
last_seen
times by a peer if any of them are in the future. To do this, the offset is calculated by subtracting the most advanced reported time (in the remote peer's clock) by the current time (in the local clock).The code in this pull request has:
Review
@teor2345 wrote the proposed solution in the original issue.
Related Issues
Closes #1871
Follow Up Work