-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Observability To Colocated Auction Runloop #1930
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created methods on the Metrics type in order to increment/observe things. I'm not 100% happy with this, but wanted to see what others think in terms of legibility.
Why are you not happy with this? Seems fine and readable to me...
"reveal auction id missmatch" | ||
); | ||
.map_err(RevealError::Failure)?; | ||
if !response |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Are we putting the auction ID data also on-chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we started doing this.
// Take extra care to not accidentally keep the borrow alive within | ||
// the `while` body, which would block senders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wow, just learned about this behavior? This seems like quite a foot gun on this abstraction (ideally we can just get the most recent block without causing a potential write lock)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, my understanding is that this is related to where the compiler places the Drop
. For example, see this issue.
That being said, I don't think it was a problem in our usage here, I just adapted the existing unbounded loop
to be a while
loop and just reworded the comment that was already there (I assumed it was done this way for a reason, I didn't verify it as such):
services/crates/autopilot/src/run_loop.rs
Lines 435 to 441 in 4dbeeb8
loop { | |
// This could be a while loop. It isn't, because some care must be taken to not | |
// accidentally keep the borrow alive, which would block senders. Technically | |
// this is fine with while conditions but this is clearer. | |
if self.current_block.borrow().number > deadline { | |
break; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should review whether this is actually needed (I also think it's not in this case) just to avoid further confusion in the future.
I'm fine with leaving this as is since this PR is urgent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this quick test that seems to suggest that the intermediary value from the while
condition expression get dropped.
I feel like the |
In the new architecture we would define a separate observe module and do all logging/metrics there? I kind of like that as well but is probably a large refactor for the autopilot... |
My comment is unrelated to this, and more in the context of this individual module:
And not about the metrics architecture in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice observability improvement! Just seems to me that we would currently not get any meaningful timing information out of the driver runs due to join_all()
.
Approving assuming this gets addressed.
// Take extra care to not accidentally keep the borrow alive within | ||
// the `while` body, which would block senders. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should review whether this is actually needed (I also think it's not in this case) just to avoid further confusion in the future.
I'm fine with leaving this as is since this PR is urgent.
IMO we could also move the |
Co-authored-by: Felix Leupold <felixleupold90@gmail.com>
7945897
to
d84b7dc
Compare
Description
This PR refactors the autopilot runloop to add some additional observability to the solver competition. This will allow us to have a better idea of how individual solvers are behaving.
As a note on the code and how to record metrics - I created methods on the
Metrics
type in order to increment/observe things. I'm not 100% happy with this, but wanted to see what others think in terms of legibility.Changes
solve
timings.How to test
The code should be covered by existing E2E tests in the services.