Observability To Colocated Auction Runloop #1930

nlordell · 2023-10-08T19:29:57Z

Description

This PR refactors the autopilot runloop to add some additional observability to the solver competition. This will allow us to have a better idea of how individual solvers are behaving.

As a note on the code and how to record metrics - I created methods on the Metrics type in order to increment/observe things. I'm not 100% happy with this, but wanted to see what others think in terms of legibility.

Changes

Added gauge for last auction ID seen in the runloop.
Specific error types for all driver interactions
Added metrics counters for results of driver interactions (solving, revealing and settling), and specifically a histogram for solve timings.

How to test

The code should be covered by existing E2E tests in the services.

fleupold

I created methods on the Metrics type in order to increment/observe things. I'm not 100% happy with this, but wanted to see what others think in terms of legibility.

Why are you not happy with this? Seems fine and readable to me...

fleupold · 2023-10-09T07:54:31Z

crates/autopilot/src/run_loop.rs

-            "reveal auction id missmatch"
-        );
+            .map_err(RevealError::Failure)?;
+        if !response


Interesting. Are we putting the auction ID data also on-chain?

Yes, we started doing this.

fleupold · 2023-10-09T07:56:13Z

crates/autopilot/src/run_loop.rs

+            // Take extra care to not accidentally keep the borrow alive within
+            // the `while` body, which would block senders.


Oh wow, just learned about this behavior? This seems like quite a foot gun on this abstraction (ideally we can just get the most recent block without causing a potential write lock)

So, my understanding is that this is related to where the compiler places the Drop. For example, see this issue.

That being said, I don't think it was a problem in our usage here, I just adapted the existing unbounded loop to be a while loop and just reworded the comment that was already there (I assumed it was done this way for a reason, I didn't verify it as such):

services/crates/autopilot/src/run_loop.rs

Lines 435 to 441 in 4dbeeb8

loop {

// This could be a while loop. It isn't, because some care must be taken to not

// accidentally keep the borrow alive, which would block senders. Technically

// this is fine with while conditions but this is clearer.

if self.current_block.borrow().number > deadline {

break;

}

I think we should review whether this is actually needed (I also think it's not in this case) just to avoid further confusion in the future.
I'm fine with leaving this as is since this PR is urgent.

I did this quick test that seems to suggest that the intermediary value from the while condition expression get dropped.

https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=271343a442033c63dfbc869ad62e064a

crates/autopilot/src/run_loop.rs

nlordell · 2023-10-09T08:56:26Z

Why are you not happy with this? Seems fine and readable to me...

I feel like the Metrics::get().observe(...) is peppered in the code and slightly inconsistent, and adds cognitive load when reading (instead of having a single place where we record all metrics for example).

fleupold · 2023-10-09T09:26:36Z

I feel like the Metrics::get().observe(...) is peppered in the code and slightly inconsistent

In the new architecture we would define a separate observe module and do all logging/metrics there? I kind of like that as well but is probably a large refactor for the autopilot...

nlordell · 2023-10-09T09:44:31Z

In the new architecture we would define a separate observe module and do all logging/metrics there?

My comment is unrelated to this, and more in the context of this individual module:

we update metrics in single_auction and in competition methods rather arbitrarily, instead of in a single function of the RunLoop type. The metrics adds additional control flow at those points, and it would be nice for it to be more consolidated.

And not about the metrics architecture in the driver. I don't think either @MartinquaXD nor myself are particularly fond of logging from a single module (there are some downsides) so I would advise against it (and generally change how we do things in the driver to be more consistent with the rest of the project; I personally see it as an unsuccessful experiment).

MartinquaXD

Nice observability improvement! Just seems to me that we would currently not get any meaningful timing information out of the driver runs due to join_all().
Approving assuming this gets addressed.

crates/autopilot/src/run_loop.rs

MartinquaXD · 2023-10-09T09:51:27Z

crates/autopilot/src/run_loop.rs

+            // Take extra care to not accidentally keep the borrow alive within
+            // the `while` body, which would block senders.


I think we should review whether this is actually needed (I also think it's not in this case) just to avoid further confusion in the future.
I'm fine with leaving this as is since this PR is urgent.

MartinquaXD · 2023-10-09T09:58:45Z

I feel like the Metrics::get().observe(...) is peppered in the code and slightly inconsistent

IMO we could also move the Metrics::get() part into each .observe() function implementation. Then it's just Metrics::observe() which seems slightly nicer IMO. No strong opinion, though.

Co-authored-by: Felix Leupold <felixleupold90@gmail.com>

nlordell requested a review from a team as a code owner October 8, 2023 19:29

fleupold approved these changes Oct 9, 2023

View reviewed changes

MartinquaXD approved these changes Oct 9, 2023

View reviewed changes

Nicholas Rodrigues Lordello and others added 6 commits October 9, 2023 15:12

Observability To Colocated Auction Runloop

da9ee4a

Additional metrics

da18431

Different log message for solver without solutions

ef90b81

Make sure to update auction even if its empty

b2331a1

Update crates/autopilot/src/run_loop.rs

7ca58b3

Co-authored-by: Felix Leupold <felixleupold90@gmail.com>

Fix timing metrics, OOPS

d84b7dc

nlordell force-pushed the refactor-autopilot-runloop branch from 7945897 to d84b7dc Compare October 9, 2023 13:18

nlordell merged commit fba44ba into main Oct 9, 2023
7 checks passed

nlordell deleted the refactor-autopilot-runloop branch October 9, 2023 13:26

github-actions bot locked and limited conversation to collaborators Oct 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Observability To Colocated Auction Runloop #1930

Observability To Colocated Auction Runloop #1930

nlordell commented Oct 8, 2023

fleupold left a comment

fleupold Oct 9, 2023

nlordell Oct 9, 2023

fleupold Oct 9, 2023

nlordell Oct 9, 2023

MartinquaXD Oct 9, 2023

nlordell Oct 9, 2023

nlordell commented Oct 9, 2023

fleupold commented Oct 9, 2023

nlordell commented Oct 9, 2023

MartinquaXD left a comment

MartinquaXD Oct 9, 2023

MartinquaXD commented Oct 9, 2023

		// Take extra care to not accidentally keep the borrow alive within
		// the `while` body, which would block senders.

	loop {
	// This could be a while loop. It isn't, because some care must be taken to not
	// accidentally keep the borrow alive, which would block senders. Technically
	// this is fine with while conditions but this is clearer.
	if self.current_block.borrow().number > deadline {
	break;
	}

Observability To Colocated Auction Runloop #1930

Observability To Colocated Auction Runloop #1930

Conversation

nlordell commented Oct 8, 2023

Description

Changes

How to test

fleupold left a comment

Choose a reason for hiding this comment

fleupold Oct 9, 2023

Choose a reason for hiding this comment

nlordell Oct 9, 2023

Choose a reason for hiding this comment

fleupold Oct 9, 2023

Choose a reason for hiding this comment

nlordell Oct 9, 2023

Choose a reason for hiding this comment

MartinquaXD Oct 9, 2023

Choose a reason for hiding this comment

nlordell Oct 9, 2023

Choose a reason for hiding this comment

nlordell commented Oct 9, 2023

fleupold commented Oct 9, 2023

nlordell commented Oct 9, 2023

MartinquaXD left a comment

Choose a reason for hiding this comment

MartinquaXD Oct 9, 2023

Choose a reason for hiding this comment

MartinquaXD commented Oct 9, 2023