Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional logging #160

Closed
wants to merge 15 commits into from
Closed

Additional logging #160

wants to merge 15 commits into from

Conversation

polydez
Copy link
Contributor

@polydez polydez commented Jan 18, 2024

Improved logging. We don't use request IDs for now, will add them in future.
Not important loggings are placed on debug level, others are "info".

@polydez polydez marked this pull request as ready for review January 18, 2024 15:59
Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! This is not a review just yet - but a few thoughts:

First, I noticed a lot of changes are basically adding #[instrument(...)] attributes to functions. What kind of output does this generate? Specifically, I'm wonder how much of contextual information we capture with these.

One thing that would be great to get from these is understanding of how different objects move through the system. Let's take the block producer for example. At the highest level, I'd love to see the following events logged:

  • Transaction queue:
    • Adding transaction with ID $x$ to transaction queue (start of this function).
    • Added transaction with ID $x$ to transaction queue (end of the same function).
    • Or if something went wrong, failed to add transaction with ID $x$ to transaction queue.
  • Batch builder
    • Building a transaction batch of $n$ transactions, ideally with transaction IDs also logged (start of this function). Maybe we also defined a batch ID (could be a hash of transaction IDs) and log it as well.
    • Built a transaction batch with ID $y$ and $n$ transactions (end of the same function).
    • Or if something went wrong, failed to build a batch with ID $y$.
  • Block builder
    • Building a block of $m$ batches, ideally with batch IDs also logged (start of this function).
    • Built a block of $m$ batches with block hash and block header also logged (end of the same function).
    • Or if something went wrong, failed bo build a block $m$ batches (somehow correlated to the previous events).

The above is just an example. We can log much more and the data captured at each point can probably be improved. But I'm curious if adding #[instrument(...)] to functions will allow us to capture this info, or if we need to add more manual logging.

Second, I'm wondering if we should split this tasks into a few PRs each focused on tracing specific parts of the system - e.g., one PR for block producer, one PR for store, one PR for RPC.

@polydez
Copy link
Contributor Author

polydez commented Jan 19, 2024

@bobbinth

First, I noticed a lot of changes are basically adding #[instrument(...)] attributes to functions. What kind of output does this generate? Specifically, I'm wonder how much of contextual information we capture with these.

This generates span for function and logs entering function, arguments, exit, return value (needed to be requested explicitly using ret keyword) and execution timing. It also provides grouping of log events and nested spans or instrumented calls.

One thing that would be great to get from these is understanding of how different objects move through the system. Let's take the block producer for example. At the highest level, I'd love to see the following events logged:

Thank you! I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

The above is just an example. We can log much more and the data captured at each point can probably be improved. But I'm curious if adding #[instrument(...)] to functions will allow us to capture this info, or if we need to add more manual logging.

It depends on how code is organised. I've found, that the most of our code can be just instrument'ed, but in some rare cases it is needed to add manual events.

Second, I'm wondering if we should split this tasks into a few PRs each focused on tracing specific parts of the system - e.g., one PR for block producer, one PR for store, one PR for RPC.

I'm not sure we need to split it now, but we would create additional subtask(s) for logging improvement, if it needs additional development (we need at least track requests). Now I should address comments and double check workflows you mentioned.

@bobbinth
Copy link
Contributor

I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

Great! Could you upload an example of how it looks like?

One challenging part could be generating a transaction - but I know @igamigo is working on a short write up on how this can be done using miden-client.

@igamigo
Copy link
Collaborator

igamigo commented Jan 19, 2024

I think, I've done all of this by instrument'ing functions and adding manual events, but I will double check that these crucial workflows are covered.

Great! Could you upload an example of how it looks like?

One challenging part could be generating a transaction - but I know @igamigo is working on a short write up on how this can be done using miden-client.

I do have this quite informal gist for now: https://gist.github.com/igamigo/5c4eef6be2a8b0d211767f85180de818. That should work fine for testing although it will be a little rough around the edges. After merging the branches I mention in the gist, I was planning on writing it more formally

@polydez
Copy link
Contributor Author

polydez commented Jan 19, 2024

@bobbinth, @hackaugusto, I think, I've addressed all of the comments. Could you please look into it?

&self,
formatter: &mut Formatter<'_>,
) -> std::fmt::Result {
formatter.write_str(&self.encode_hex::<String>())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

neat :)

@hackaugusto
Copy link
Contributor

@bobbinth, @hackaugusto, I think, I've addressed all of the comments. Could you please look into it?

Overall this looks good to me, it would be nice to have the sample output that bobbin requested, to have an idea of the verbosity, I think we will need to tune some of these logging level down from INFO to trace (specially the ones that are printing complete merkle trees via the Debug trait), but that can also be done in follow ups.

@polydez
Copy link
Contributor Author

polydez commented Jan 19, 2024

Overall this looks good to me, it would be nice to have the sample output that bobbin requested, to have an idea of the verbosity, I think we will need to tune some of these logging level down from INFO to trace (specially the ones that are printing complete merkle trees via the Debug trait), but that can also be done in follow ups.

Thank you, @hackaugusto! Currently RpoDigest is being logged little ugly. I have no idea, how to do it better in instrument, only by logging field-by-field manually. As an alternative, I could write thin wrappers for such fields, but it would be a lot of boilerplate which I'd like to avoid.

Copy link
Contributor

@bobbinth bobbinth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! So far I've reviewed only the submit_proven_transaction flow, which I think looks roughly as follows:

  • submit_proven_transaction
    • parse ProvenTransaction
    • TxQueue::add_transaction()
      • StateView::verify_tx()
        • StateView::ensure_in_flight_constraints()
        • Store::get_tx_inputs() - from here the request goes to the store.
        • StateView::ensure_tx_inputs_constraints()
        • StateView::ensure_in_flight_constraints()

I left a some comments inline, but also a couple of general comments:

  1. It would be great to see sample output for these logs as it is difficult for me to visualize what happens. For example, when there is an error somewhere, would this error get currently logged multiple times or just once?.
  2. I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
    a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
    b. Build transaction batch.
    c. Build block.
    d. Various data retrieval requests to the RPC.
  3. Tracking requests across components using request IDs should probably be a separate task and we should create a new issue for this.

Comment on lines 43 to 44
let request_id = gen_request_id();
let _span = info_span!("submit_proven_transaction", request_id, ?request, COMPONENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments here:

First, I'd probably get rid of request_id here as I'm not sure it adds much value right now. My understanding is that we'd want to use request IDs for correlating requests across different components, and this is a bigger issue that we'll address in subsequent PRs.

Second, this would log the entire request at info level, right? If so, I think this could be quite verbose as these requests could be 60KB - 100KB of data (STARK proofs are pretty big). Plus, I think the request is mostly binary data. If we do want to log the entire request, we should probably do this at the trace level (but would still keep the span at the info level).

Also, general question (mostly for my understanding): this span will be dropped at the end of the function, right? So, if an error happens anywhere below, will it be captured in this span?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth thank you for clarifications. I will remove request_id. I agree, it will be better to hide only STARK proofs under trace. Spans are not able to catch results by themselves, in this case we would need to process it by hands, thank you for noticing. I will move span to instrument since we don't need request_id here anymore, instrument does support logging of results and errors.

let tx = ProvenTransaction::read_from_bytes(&request.transaction)
.map_err(|_| Status::invalid_argument("Invalid transaction"))?;

debug!(request_id, ?tx, COMPONENT);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to the last comment, I think this may be too verbose as ProvenTransactions may be dozens of KBs of binary data. Ideally, we'd log the following:

  1. At info level something like: "parsed proven transaction" with transaction ID.
  2. At debug level all details of the transaction except for the proof.

For the second point, I'd probably create a wrapper struct with Display implementation - though maybe doing it field-by-field could work too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth got it!

Comment on lines 163 to 167
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
#[instrument(skip(self), ret, err(Debug), fields(COMPONENT))]
async fn add_transaction(
&self,
tx: SharedProvenTx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments here:

  1. Logging full SahredProvenTx here would probably be too verbose. We can probably just log the transaction ID field instead of the full struct.
  2. Do we need to log the return value since it is just ()?
  3. Similarly, do we need to log the error value? I'm assuming it would be captured by the parent span, but maybe not?
  4. Do we also need log the COMPONENT here? Or would specifying it just for the parent span be enough?

Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth

  1. Agree, will log it manually.
  2. You're right, I will remove it.
  3. My preference is to log errors on each point, because code might be changed later and some errors might be processed/swallowed silently in parent calls.
  4. Currently we use logging to console/file, which is not grouped by spans. In this case it's better to have COMPONENT label for each line.

Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.

Got it!

Comment on lines +48 to 52
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
#[instrument(skip(self), ret, err(Debug), fields(COMPONENT))]
async fn verify_tx(
&self,
candidate_tx: SharedProvenTx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments/questions to the ones above re needing to log ret, err, COMPONENT etc.

@@ -131,6 +134,7 @@ where
/// 1. the candidate transaction doesn't modify the same account as an existing in-flight transaction
/// 2. no consumed note's nullifier in candidate tx's consumed notes is already contained
/// in `already_consumed_nullifiers`
#[instrument(ret, err(Debug), fields(COMPONENT))]
Copy link
Contributor

@bobbinth bobbinth Jan 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to log arguments here and probably can skip logging err and COMPONENT (assuming they are already captured by the parent spans).

Also, maybe this should be instrumented at debug level?

Comment on lines +169 to 172
#[instrument(ret, err(Debug), fields(COMPONENT))]
fn ensure_tx_inputs_constraints(
candidate_tx: SharedProvenTx,
tx_inputs: TxInputs,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to the ones above.

Comment on lines +100 to 104
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue
#[instrument(skip(self), ret, err, fields(COMPONENT))]
async fn get_tx_inputs(
&self,
proven_tx: SharedProvenTx,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar comments to the ones before. But this one we can probably keep at info level.

Comment on lines +49 to 52
#[derive(Debug)]
pub struct TxInputs {
/// The account hash in the store corresponding to tx's account ID
pub account_hash: Option<Digest>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably manually implement Display on this to make sure we get good output in the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth I've implemented Debug for protobuf's Digest and it looks nice now. The only issue is with RpoDigest, will have to handle them manually.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe RpoDigest has Display implemented on it which prints out the value in hex.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bobbinth

I believe RpoDigest has Display implemented on it which prints out the value in hex.

Yes, but in a case when digest is a field in structure, I can't specify which trait to use when logging it.

@polydez
Copy link
Contributor Author

polydez commented Jan 22, 2024

@bobbinth, thank you for such a deep analysis!

  1. It would be great to see sample output for these logs as it is difficult for me to visualize what happens. For example, when there is an error somewhere, would this error get currently logged multiple times or just once?.

I will try to check such workflow(s). Generally, if we just instrument all methods with ret/err keywords, we will get results and errors from all subcalls. We can fine-tune this behavior, if it's too verbose.

  1. I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
    a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
    b. Build transaction batch.
    c. Build block.
    d. Various data retrieval requests to the RPC.

Okay, lets split it, thank you!

  1. Tracking requests across components using request IDs should probably be a separate task and we should create a new issue for this.

Got it!

@polydez
Copy link
Contributor Author

polydez commented Jan 22, 2024

@bobbinth

  1. I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
    a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
    b. Build transaction batch.
    c. Build block.
    d. Various data retrieval requests to the RPC.

Okay, lets split it, thank you!

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

@bobbinth
Copy link
Contributor

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

I would be more difficult for me to review it this way. Having a PR per flow makes it easier to figure out the relevant parts to review.

@polydez
Copy link
Contributor Author

polydez commented Jan 22, 2024

How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow?

I would be more difficult for me to review it this way. Having a PR per flow makes it easier to figure out the relevant parts to review.

Got your point, thanks!

@polydez polydez marked this pull request as draft January 26, 2024 06:28
@polydez polydez closed this Jan 26, 2024
@hackaugusto hackaugusto deleted the polydez-additional-logging branch January 26, 2024 07:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants