-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Additional logging #160
Additional logging #160
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! This is not a review just yet - but a few thoughts:
First, I noticed a lot of changes are basically adding #[instrument(...)]
attributes to functions. What kind of output does this generate? Specifically, I'm wonder how much of contextual information we capture with these.
One thing that would be great to get from these is understanding of how different objects move through the system. Let's take the block producer for example. At the highest level, I'd love to see the following events logged:
- Transaction queue:
- Adding transaction with ID
$x$ to transaction queue (start of this function). - Added transaction with ID
$x$ to transaction queue (end of the same function). - Or if something went wrong, failed to add transaction with ID
$x$ to transaction queue.
- Adding transaction with ID
- Batch builder
- Building a transaction batch of
$n$ transactions, ideally with transaction IDs also logged (start of this function). Maybe we also defined a batch ID (could be a hash of transaction IDs) and log it as well. - Built a transaction batch with ID
$y$ and$n$ transactions (end of the same function). - Or if something went wrong, failed to build a batch with ID
$y$ .
- Building a transaction batch of
- Block builder
- Building a block of
$m$ batches, ideally with batch IDs also logged (start of this function). - Built a block of
$m$ batches with block hash and block header also logged (end of the same function). - Or if something went wrong, failed bo build a block
$m$ batches (somehow correlated to the previous events).
- Building a block of
The above is just an example. We can log much more and the data captured at each point can probably be improved. But I'm curious if adding #[instrument(...)]
to functions will allow us to capture this info, or if we need to add more manual logging.
Second, I'm wondering if we should split this tasks into a few PRs each focused on tracing specific parts of the system - e.g., one PR for block producer, one PR for store, one PR for RPC.
This generates
Thank you! I think, I've done all of this by
It depends on how code is organised. I've found, that the most of our code can be just
I'm not sure we need to split it now, but we would create additional subtask(s) for logging improvement, if it needs additional development (we need at least track requests). Now I should address comments and double check workflows you mentioned. |
This reverts commit 8256410.
Great! Could you upload an example of how it looks like? One challenging part could be generating a transaction - but I know @igamigo is working on a short write up on how this can be done using |
# Conflicts: # store/src/genesis.rs # store/src/state.rs
I do have this quite informal gist for now: https://gist.github.com/igamigo/5c4eef6be2a8b0d211767f85180de818. That should work fine for testing although it will be a little rough around the edges. After merging the branches I mention in the gist, I was planning on writing it more formally |
@bobbinth, @hackaugusto, I think, I've addressed all of the comments. Could you please look into it? |
&self, | ||
formatter: &mut Formatter<'_>, | ||
) -> std::fmt::Result { | ||
formatter.write_str(&self.encode_hex::<String>()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
neat :)
Overall this looks good to me, it would be nice to have the sample output that bobbin requested, to have an idea of the verbosity, I think we will need to tune some of these logging level down from |
Thank you, @hackaugusto! Currently |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! So far I've reviewed only the submit_proven_transaction
flow, which I think looks roughly as follows:
submit_proven_transaction
- parse
ProvenTransaction
TxQueue::add_transaction()
StateView::verify_tx()
StateView::ensure_in_flight_constraints()
Store::get_tx_inputs()
- from here the request goes to the store.StateView::ensure_tx_inputs_constraints()
StateView::ensure_in_flight_constraints()
- parse
I left a some comments inline, but also a couple of general comments:
- It would be great to see sample output for these logs as it is difficult for me to visualize what happens. For example, when there is an error somewhere, would this error get currently logged multiple times or just once?.
- I still think it would be a good idea to split this PR into multiple PRs. This way, we can go "narrow but deep" rather than "shallow but wide" with each PR. Having looked at it more, I'd probably do a PR per "flow". The flows could be:
a. Submit proven TX (this is what I reviewed here - but including RPC and store components).
b. Build transaction batch.
c. Build block.
d. Various data retrieval requests to the RPC. - Tracking requests across components using request IDs should probably be a separate task and we should create a new issue for this.
block-producer/src/server/api.rs
Outdated
let request_id = gen_request_id(); | ||
let _span = info_span!("submit_proven_transaction", request_id, ?request, COMPONENT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of comments here:
First, I'd probably get rid of request_id
here as I'm not sure it adds much value right now. My understanding is that we'd want to use request IDs for correlating requests across different components, and this is a bigger issue that we'll address in subsequent PRs.
Second, this would log the entire request at info
level, right? If so, I think this could be quite verbose as these requests could be 60KB - 100KB of data (STARK proofs are pretty big). Plus, I think the request is mostly binary data. If we do want to log the entire request, we should probably do this at the trace
level (but would still keep the span at the info
level).
Also, general question (mostly for my understanding): this span will be dropped at the end of the function, right? So, if an error happens anywhere below, will it be captured in this span?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobbinth thank you for clarifications. I will remove request_id
. I agree, it will be better to hide only STARK proofs under trace
. Spans are not able to catch results by themselves, in this case we would need to process it by hands, thank you for noticing. I will move span to instrument
since we don't need request_id
here anymore, instrument
does support logging of results and errors.
block-producer/src/server/api.rs
Outdated
let tx = ProvenTransaction::read_from_bytes(&request.transaction) | ||
.map_err(|_| Status::invalid_argument("Invalid transaction"))?; | ||
|
||
debug!(request_id, ?tx, COMPONENT); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the last comment, I think this may be too verbose as ProvenTransactions
may be dozens of KBs of binary data. Ideally, we'd log the following:
- At
info
level something like: "parsed proven transaction" with transaction ID. - At
debug
level all details of the transaction except for theproof
.
For the second point, I'd probably create a wrapper struct with Display
implementation - though maybe doing it field-by-field could work too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobbinth got it!
block-producer/src/txqueue/mod.rs
Outdated
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue | ||
#[instrument(skip(self), ret, err(Debug), fields(COMPONENT))] | ||
async fn add_transaction( | ||
&self, | ||
tx: SharedProvenTx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments here:
- Logging full
SahredProvenTx
here would probably be too verbose. We can probably just log the transaction ID field instead of the full struct. - Do we need to log the return value since it is just
()
? - Similarly, do we need to log the error value? I'm assuming it would be captured by the parent span, but maybe not?
- Do we also need log the
COMPONENT
here? Or would specifying it just for the parent span be enough?
Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Agree, will log it manually.
- You're right, I will remove it.
- My preference is to log errors on each point, because code might be changed later and some errors might be processed/swallowed silently in parent calls.
- Currently we use logging to console/file, which is not grouped by spans. In this case it's better to have
COMPONENT
label for each line.
Also, I think we should add an event at the and of this functions (i.e., line 175 below) - something like "Added transaction to proven transaction queue" with transaction ID and maybe current number of transactions in the queue.
Got it!
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue | ||
#[instrument(skip(self), ret, err(Debug), fields(COMPONENT))] | ||
async fn verify_tx( | ||
&self, | ||
candidate_tx: SharedProvenTx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments/questions to the ones above re needing to log ret
, err
, COMPONENT
etc.
@@ -131,6 +134,7 @@ where | |||
/// 1. the candidate transaction doesn't modify the same account as an existing in-flight transaction | |||
/// 2. no consumed note's nullifier in candidate tx's consumed notes is already contained | |||
/// in `already_consumed_nullifiers` | |||
#[instrument(ret, err(Debug), fields(COMPONENT))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to log arguments here and probably can skip logging err
and COMPONENT
(assuming they are already captured by the parent spans).
Also, maybe this should be instrumented at debug
level?
#[instrument(ret, err(Debug), fields(COMPONENT))] | ||
fn ensure_tx_inputs_constraints( | ||
candidate_tx: SharedProvenTx, | ||
tx_inputs: TxInputs, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments to the ones above.
#[allow(clippy::blocks_in_conditions)] // Workaround of `instrument` issue | ||
#[instrument(skip(self), ret, err, fields(COMPONENT))] | ||
async fn get_tx_inputs( | ||
&self, | ||
proven_tx: SharedProvenTx, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comments to the ones before. But this one we can probably keep at info
level.
#[derive(Debug)] | ||
pub struct TxInputs { | ||
/// The account hash in the store corresponding to tx's account ID | ||
pub account_hash: Option<Digest>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably manually implement Display
on this to make sure we get good output in the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bobbinth I've implemented Debug
for protobuf's Digest
and it looks nice now. The only issue is with RpoDigest
, will have to handle them manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe RpoDigest
has Display
implemented on it which prints out the value in hex.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe
RpoDigest
hasDisplay
implemented on it which prints out the value in hex.
Yes, but in a case when digest is a field in structure, I can't specify which trait to use when logging it.
@bobbinth, thank you for such a deep analysis!
I will try to check such workflow(s). Generally, if we just
Okay, lets split it, thank you!
Got it! |
How about keeping the current situation with logging in this PR, but making additional PRs for logging improvement for each flow? |
I would be more difficult for me to review it this way. Having a PR per flow makes it easier to figure out the relevant parts to review. |
Got your point, thanks! |
…gging # Conflicts: # node/src/commands.rs # store/src/genesis.rs
Improved logging. We don't use request IDs for now, will add them in future.
Not important loggings are placed on debug level, others are "info".