Additional logging #160

polydez · 2024-01-18T15:48:12Z

Improved logging. We don't use request IDs for now, will add them in future.
Not important loggings are placed on debug level, others are "info".

block-producer/src/batch_builder/mod.rs

block-producer/src/lib.rs

utils/src/logging.rs

bobbinth

Thank you! This is not a review just yet - but a few thoughts:

First, I noticed a lot of changes are basically adding #[instrument(...)] attributes to functions. What kind of output does this generate? Specifically, I'm wonder how much of contextual information we capture with these.

One thing that would be great to get from these is understanding of how different objects move through the system. Let's take the block producer for example. At the highest level, I'd love to see the following events logged:

Transaction queue:
- Adding transaction with ID $x$ to transaction queue (start of this function).
- Added transaction with ID $x$ to transaction queue (end of the same function).
- Or if something went wrong, failed to add transaction with ID $x$ to transaction queue.
Batch builder
- Building a transaction batch of $n$ transactions, ideally with transaction IDs also logged (start of this function). Maybe we also defined a batch ID (could be a hash of transaction IDs) and log it as well.
- Built a transaction batch with ID $y$ and $n$ transactions (end of the same function).
- Or if something went wrong, failed to build a batch with ID $y$.
Block builder
- Building a block of $m$ batches, ideally with batch IDs also logged (start of this function).
- Built a block of $m$ batches with block hash and block header also logged (end of the same function).
- Or if something went wrong, failed bo build a block $m$ batches (somehow correlated to the previous events).

The above is just an example. We can log much more and the data captured at each point can probably be improved. But I'm curious if adding #[instrument(...)] to functions will allow us to capture this info, or if we need to add more manual logging.

Second, I'm wondering if we should split this tasks into a few PRs each focused on tracing specific parts of the system - e.g., one PR for block producer, one PR for store, one PR for RPC.

polydez · 2024-01-19T09:48:43Z

@bobbinth

First, I noticed a lot of changes are basically adding #[instrument(...)] attributes to functions. What kind of output does this generate? Specifically, I'm wonder how much of contextual information we capture with these.

This generates span for function and logs entering function, arguments, exit, return value (needed to be requested explicitly using ret keyword) and execution timing. It also provides grouping of log events and nested spans or instrumented calls.

One thing that would be great to get from these is understanding of how different objects move through the system. Let's take the block producer for example. At the highest level, I'd love to see the following events logged:

Thank you! I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

The above is just an example. We can log much more and the data captured at each point can probably be improved. But I'm curious if adding #[instrument(...)] to functions will allow us to capture this info, or if we need to add more manual logging.

It depends on how code is organised. I've found, that the most of our code can be just instrument'ed, but in some rare cases it is needed to add manual events.

Second, I'm wondering if we should split this tasks into a few PRs each focused on tracing specific parts of the system - e.g., one PR for block producer, one PR for store, one PR for RPC.

I'm not sure we need to split it now, but we would create additional subtask(s) for logging improvement, if it needs additional development (we need at least track requests). Now I should address comments and double check workflows you mentioned.

This reverts commit 8256410.

bobbinth · 2024-01-19T09:52:21Z

I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

Great! Could you upload an example of how it looks like?

One challenging part could be generating a transaction - but I know @igamigo is working on a short write up on how this can be done using miden-client.

# Conflicts: # store/src/genesis.rs # store/src/state.rs

igamigo · 2024-01-19T13:31:04Z

I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

Great! Could you upload an example of how it looks like?

One challenging part could be generating a transaction - but I know @igamigo is working on a short write up on how this can be done using miden-client.

I do have this quite informal gist for now: https://gist.github.com/igamigo/5c4eef6be2a8b0d211767f85180de818. That should work fine for testing although it will be a little rough around the edges. After merging the branches I mention in the gist, I was planning on writing it more formally

polydez · 2024-01-19T13:32:31Z

@bobbinth, @hackaugusto, I think, I've addressed all of the comments. Could you please look into it?

hackaugusto · 2024-01-19T13:59:28Z

proto/src/hex.rs

+        &self,
+        formatter: &mut Formatter<'_>,
+    ) -> std::fmt::Result {
+        formatter.write_str(&self.encode_hex::<String>())


hackaugusto · 2024-01-19T14:13:08Z

@bobbinth, @hackaugusto, I think, I've addressed all of the comments. Could you please look into it?

Overall this looks good to me, it would be nice to have the sample output that bobbin requested, to have an idea of the verbosity, I think we will need to tune some of these logging level down from INFO to trace (specially the ones that are printing complete merkle trees via the Debug trait), but that can also be done in follow ups.

polydez · 2024-01-19T14:48:40Z

Overall this looks good to me, it would be nice to have the sample output that bobbin requested, to have an idea of the verbosity, I think we will need to tune some of these logging level down from INFO to trace (specially the ones that are printing complete merkle trees via the Debug trait), but that can also be done in follow ups.

Thank you, @hackaugusto! Currently RpoDigest is being logged little ugly. I have no idea, how to do it better in instrument, only by logging field-by-field manually. As an alternative, I could write thin wrappers for such fields, but it would be a lot of boilerplate which I'd like to avoid.

bobbinth

Thank you! So far I've reviewed only the submit_proven_transaction flow, which I think looks roughly as follows:

submit_proven_transaction
- parse ProvenTransaction
- TxQueue::add_transaction()
  - StateView::verify_tx()
    - StateView::ensure_in_flight_constraints()
    - Store::get_tx_inputs() - from here the request goes to the store.
    - StateView::ensure_tx_inputs_constraints()
    - StateView::ensure_in_flight_constraints()

I left a some comments inline, but also a couple of general comments:

It would be great to see sample output for these logs as it is difficult for me to visualize what happens. For example, when there is an error somewhere, would this error get currently logged multiple times or just once?.
I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
b. Build transaction batch.
c. Build block.
d. Various data retrieval requests to the RPC.
Tracking requests across components using request IDs should probably be a separate task and we should create a new issue for this.

bobbinth · 2024-01-20T08:11:32Z

block-producer/src/server/api.rs

+        let request_id = gen_request_id();
+        let _span = info_span!("submit_proven_transaction", request_id, ?request, COMPONENT);


A couple of comments here:

First, I'd probably get rid of request_id here as I'm not sure it adds much value right now. My understanding is that we'd want to use request IDs for correlating requests across different components, and this is a bigger issue that we'll address in subsequent PRs.

Second, this would log the entire request at info level, right? If so, I think this could be quite verbose as these requests could be 60KB - 100KB of data (STARK proofs are pretty big). Plus, I think the request is mostly binary data. If we do want to log the entire request, we should probably do this at the trace level (but would still keep the span at the info level).

Also, general question (mostly for my understanding): this span will be dropped at the end of the function, right? So, if an error happens anywhere below, will it be captured in this span?

@bobbinth thank you for clarifications. I will remove request_id. I agree, it will be better to hide only STARK proofs under trace. Spans are not able to catch results by themselves, in this case we would need to process it by hands, thank you for noticing. I will move span to instrument since we don't need request_id here anymore, instrument does support logging of results and errors.

bobbinth · 2024-01-20T08:24:28Z

block-producer/src/server/api.rs

        let tx = ProvenTransaction::read_from_bytes(&request.transaction)
            .map_err(|_| Status::invalid_argument("Invalid transaction"))?;

+        debug!(request_id, ?tx, COMPONENT);


Similar to the last comment, I think this may be too verbose as ProvenTransactions may be dozens of KBs of binary data. Ideally, we'd log the following:

At info level something like: "parsed proven transaction" with transaction ID.

At debug level all details of the transaction except for the proof.

For the second point, I'd probably create a wrapper struct with Display implementation - though maybe doing it field-by-field could work too.

@bobbinth got it!

bobbinth · 2024-01-20T08:32:59Z

block-producer/src/txqueue/mod.rs

+    #[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
+    #[instrument(skip(self), ret, err(Debug), fields(COMPONENT))]
    async fn add_transaction(
        &self,
        tx: SharedProvenTx,


A few comments here:

Logging full SahredProvenTx here would probably be too verbose. We can probably just log the transaction ID field instead of the full struct.

Do we need to log the return value since it is just ()?

Similarly, do we need to log the error value? I'm assuming it would be captured by the parent span, but maybe not?

Do we also need log the COMPONENT here? Or would specifying it just for the parent span be enough?

Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.

@bobbinth

Agree, will log it manually.

You're right, I will remove it.

My preference is to log errors on each point, because code might be changed later and some errors might be processed/swallowed silently in parent calls.

Currently we use logging to console/file, which is not grouped by spans. In this case it's better to have COMPONENT label for each line.

Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.

Got it!

bobbinth · 2024-01-20T08:34:25Z

block-producer/src/state_view/mod.rs

+    #[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
+    #[instrument(skip(self), ret, err(Debug), fields(COMPONENT))]
    async fn verify_tx(
        &self,
        candidate_tx: SharedProvenTx,


Similar comments/questions to the ones above re needing to log ret, err, COMPONENT etc.

bobbinth · 2024-01-20T08:36:24Z

block-producer/src/state_view/mod.rs

@@ -131,6 +134,7 @@ where
 /// 1. the candidate transaction doesn't modify the same account as an existing in-flight transaction
 /// 2. no consumed note's nullifier in candidate tx's consumed notes is already contained
 /// in `already_consumed_nullifiers`
+#[instrument(ret, err(Debug), fields(COMPONENT))]


I don't think we need to log arguments here and probably can skip logging err and COMPONENT (assuming they are already captured by the parent spans).

Also, maybe this should be instrumented at debug level?

bobbinth · 2024-01-20T08:36:45Z

block-producer/src/state_view/mod.rs

+#[instrument(ret, err(Debug), fields(COMPONENT))]
 fn ensure_tx_inputs_constraints(
    candidate_tx: SharedProvenTx,
    tx_inputs: TxInputs,


Similar comments to the ones above.

bobbinth · 2024-01-20T08:40:04Z

block-producer/src/store/mod.rs

+    #[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
+    #[instrument(skip(self), ret, err, fields(COMPONENT))]
    async fn get_tx_inputs(
        &self,
        proven_tx: SharedProvenTx,


Similar comments to the ones before. But this one we can probably keep at info level.

bobbinth · 2024-01-20T08:40:49Z

block-producer/src/store/mod.rs

+#[derive(Debug)]
 pub struct TxInputs {
    /// The account hash in the store corresponding to tx's account ID
    pub account_hash: Option<Digest>,


I would probably manually implement Display on this to make sure we get good output in the logs.

@bobbinth I've implemented Debug for protobuf's Digest and it looks nice now. The only issue is with RpoDigest, will have to handle them manually.

I believe RpoDigest has Display implemented on it which prints out the value in hex.

@bobbinth

I believe RpoDigest has Display implemented on it which prints out the value in hex.

Yes, but in a case when digest is a field in structure, I can't specify which trait to use when logging it.

polydez · 2024-01-22T06:28:24Z

@bobbinth, thank you for such a deep analysis!

It would be great to see sample output for these logs as it is difficult for me to visualize what happens. For example, when there is an error somewhere, would this error get currently logged multiple times or just once?.

I will try to check such workflow(s). Generally, if we just instrument all methods with ret/err keywords, we will get results and errors from all subcalls. We can fine-tune this behavior, if it's too verbose.

I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
b. Build transaction batch.
c. Build block.
d. Various data retrieval requests to the RPC.

Okay, lets split it, thank you!

Tracking requests across components using request IDs should probably be a separate task and we should create a new issue for this.

Got it!

polydez · 2024-01-22T07:16:53Z

@bobbinth

I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
b. Build transaction batch.
c. Build block.
d. Various data retrieval requests to the RPC.

Okay, lets split it, thank you!

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

bobbinth · 2024-01-22T07:19:08Z

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

I would be more difficult for me to review it this way. Having a PR per flow makes it easier to figure out the relevant parts to review.

polydez · 2024-01-22T07:20:44Z

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

I would be more difficult for me to review it this way. Having a PR per flow makes it easier to figure out the relevant parts to review.

Got your point, thanks!

…gging # Conflicts: # node/src/commands.rs # store/src/genesis.rs

polydez added 3 commits January 18, 2024 19:59

feat: additional logging

d05b217

fix: reverted old span behavior

53c6cce

fix: removed COMPONENT

8256410

polydez requested review from hackaugusto and bobbinth January 18, 2024 15:48

polydez mentioned this pull request Jan 18, 2024

Improve logging and tracing #139

Closed

fix: reformatted using nightly rustfmt

5c6c59d

polydez marked this pull request as ready for review January 18, 2024 15:59

hackaugusto suggested changes Jan 18, 2024

View reviewed changes

block-producer/src/batch_builder/mod.rs Show resolved Hide resolved

block-producer/src/lib.rs Outdated Show resolved Hide resolved

utils/src/logging.rs Outdated Show resolved Hide resolved

bobbinth reviewed Jan 19, 2024

View reviewed changes

Revert "fix: removed COMPONENT"

9b24b6c

This reverts commit 8256410.

polydez added 6 commits January 19, 2024 15:29

fix: added COMPONENT to [instrument]'s

fd9fb10

fix: additional logging (return errors, missed instrument's)

586e192

feat: log protobuf's Digest as hex string

8358e50

Merge branch 'main' into polydez-additional-logging

866450c

# Conflicts: # store/src/genesis.rs # store/src/state.rs

fix: clippy (nightly) warnings

f338236

feat: additional logging for block being built

91d7e8f

hackaugusto reviewed Jan 19, 2024

View reviewed changes

bobbinth requested changes Jan 20, 2024

View reviewed changes

fix: removed unnecessary ret

99139ca

polydez added 3 commits January 22, 2024 14:29

Merge remote-tracking branch 'origin/main' into polydez-additional-lo…

8230158

…gging # Conflicts: # node/src/commands.rs # store/src/genesis.rs

fix: finished merging

947d177

fix: removed request_id

9016201

polydez marked this pull request as draft January 26, 2024 06:28

polydez closed this Jan 26, 2024

hackaugusto deleted the polydez-additional-logging branch January 26, 2024 07:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Additional logging #160

Additional logging #160

polydez commented Jan 18, 2024

bobbinth left a comment

polydez commented Jan 19, 2024

bobbinth commented Jan 19, 2024

igamigo commented Jan 19, 2024

polydez commented Jan 19, 2024

hackaugusto Jan 19, 2024

hackaugusto commented Jan 19, 2024

polydez commented Jan 19, 2024

bobbinth left a comment •

edited

Loading

bobbinth Jan 20, 2024

polydez Jan 22, 2024

bobbinth Jan 20, 2024

polydez Jan 22, 2024

bobbinth Jan 20, 2024

polydez Jan 22, 2024

bobbinth Jan 20, 2024

bobbinth Jan 20, 2024 •

edited

Loading

bobbinth Jan 20, 2024

bobbinth Jan 20, 2024

bobbinth Jan 20, 2024

polydez Jan 22, 2024

bobbinth Jan 22, 2024

polydez Jan 22, 2024

polydez commented Jan 22, 2024

polydez commented Jan 22, 2024 •

edited

Loading

bobbinth commented Jan 22, 2024

polydez commented Jan 22, 2024

		let request_id = gen_request_id();
		let _span = info_span!("submit_proven_transaction", request_id, ?request, COMPONENT);

Additional logging #160

Additional logging #160

Conversation

polydez commented Jan 18, 2024

bobbinth left a comment

Choose a reason for hiding this comment

polydez commented Jan 19, 2024

bobbinth commented Jan 19, 2024

igamigo commented Jan 19, 2024

polydez commented Jan 19, 2024

Choose a reason for hiding this comment

hackaugusto commented Jan 19, 2024

polydez commented Jan 19, 2024

bobbinth left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bobbinth Jan 20, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

polydez commented Jan 22, 2024

polydez commented Jan 22, 2024 • edited Loading

bobbinth commented Jan 22, 2024

polydez commented Jan 22, 2024

bobbinth left a comment •

edited

Loading

bobbinth Jan 20, 2024 •

edited

Loading

polydez commented Jan 22, 2024 •

edited

Loading