-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit warnings when locks are held for too long #141
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neat. I didn't do a deep dive on the macro, but generally LGTM 👍🏻
trin-core/src/lib.rs
Outdated
} | ||
|
||
#[macro_export] | ||
macro_rules! rw_read { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to leave this lock macro in place everywhere (which I'm cool with), then we should probably only run this tooling overhead on non-release builds. Something like branching on debug_assertions
(I found that trick in this post).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I experimented with using debug_assertions
: 0cb2445
I think I'd prefer to drop this commit though. It makes the code a fair amount less readable, doesn't seem like it drops very much overhead, and prevents us from being able to diagnose deadlocks if someone running a release build tries to send us their logs.
What do you think?
trin-core/src/lib.rs
Outdated
} | ||
|
||
#[macro_export] | ||
macro_rules! rw_read { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you feel about implementing this as an extension trait, using std::panic::Location::caller
and #[track_caller]
?
Would be more natural than a macro, with much less code to compile.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh perfect, thank you. Yes, I used a macro exactly because I was ignorant of the existence of track_caller
and friends. I'll give using them a try!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I gave it a try but ended up failing. I don't know enough Rust to feel productive at figuring out what's going on so I think I'll leave this as a macro for now.
When I apply #[track_caller]
to the method everything builds cleanly but the caller is not tracked. I'm using cargo 1.55.0, and support for #[track_caller]
in extensions is from 1.46.0, so this should work. My best guess is that the trait is somehow interacting badly with the #[async_trait]
macro I needed to use, I expect that when this trait rewrites the method signature it doesn't faithfully copy over attributes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted the previous comment, I was misreading the output, still haven't been able to get #[track_caller]
to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem seems to be that #[track_caller]
does not work for async functions. rust-lang/rust#78840
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would something like this work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it does! Pushed some commits which incorporate it, thanks for the help
trin-core/src/lib.rs
Outdated
let now = std::time::Instant::now(); | ||
|
||
loop { | ||
tokio::select! { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's already a dependency on futures
, might be more understandable to write this using futures::future::select
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll give it a try
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there's already a dependency on
futures
, might be more understandable to write this usingfutures::future::select
?
What are the pros of using future::select
here if we have already tokio as dependency? Looking into this PR, it seems to me that tokio::select
may be the better alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the function based approach easier to read, though I don't really have that strong of an opinion on the subject.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't able to find much of a difference between the two select!
s but refactored this code a bit by unrolling the loop which should make this code a lot more understandable for first time readers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm! nothing to add from my end
8ecb3b9
to
e57ada4
Compare
- Add read_with_warn() and write_with_warn() methods to RwLock - They emit warnings if lock acqusition takes over 100ms - They emit warnings if locks are held for over 100ms - Use read_with_warn() and write_with_warn() in most places RwLock's are accessed Using these methods I was able to immediately identify the cause of a deadlock during boot.
e57ada4
to
00f7588
Compare
Quick note: I'm not sure who to give this to so I gave it to you @njgheorghita, but feel free to pass it off to somebody else if you think that's appropriate!
This PR is not yet finished, there's some cleanup to do (such as removing code duplication, and logging when
rw_write!()
calls hold onto the lock for too long, but opening a PR now to get feedback on the general approach.Add rw_read and rw_write macros which wrap access to locks
Use rw_read and rw_write macros in most places RwLock's are accessed
Using these macros I was able to immediately identify the cause of a
deadlock during boot.
Example output:
This makes it trivial to figure out where the deadlock is!