-
Notifications
You must be signed in to change notification settings - Fork 652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(o11y): Inter-process tracing #8004
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Move the check for is_height_processed before process_block_header. Previously, this check happens after, which means, the node will re-process the block header (which takes a few ms) and re-broadcast an invalid block before drops it. In the case when there are many invalid blocks circulating in the network, this can cause the node to be too busy,
Display a list of peers stored in peer store - together with information on when we attempted to connect to them. You can see it working in: http://34.147.53.32:3030/debug/pages/network_info This is at the bottom of the page - and you have to click the button to fetch this info (as this is often over 10k peers - and loading takes a while).
As an intermediate step, we will only enable flat storage for storate_read, but not storage_write.
Instead of checking the number of values and their sizes, the caches are now limited by the actual (approximated) memory consumption. This changes what `total_size` in `TrieCacheInner` means, which is also observable through Prometheus metrics. Existing configuration works with slightly altered effects. Number of entries convert to an implicit size limit. Since the explicit default size limit currently is 3GB and the default max entries is set 50k, the implicit limit = 50k * 1000B = 50MB is stronger. This still limits the number of largest entries to 50k but allows the cache to be filled with more entries when the values are smaller. For shard 3, however, where the number of entries is set to 45M in code, the memory limit of 3GB is active. Since we change how this limit is calculated we will see fewer entries cached with this change. Shard 3 should still be okay since we have a prefetcher in place now that works even when the cache is empty.
This adds code that mirrors traffic from a source chain (e.g. mainnet or testnet) to a test chain with genesis state forked from the source chain. The goal is to produce traffic that looks like source chain traffic. So in a mocknet test where we fork mainnet state for example, we can then actually observe what happens when we subsequently get traffic equivalent to mainnet traffic after the fork point. For more info, see the README in this commit.
`anyhow` is the type to return from `main`, we dont' get any value here from preserving well-typed errors, and creatng more work down the line to add all future error variants: *surely* we can fail due to more than these two errors, right?
* doc: fix typo Acton -> Action * doc: fix typo falied -> failed * doc: fix typo recieve -> receive * doc: fix typo infomation -> information * Update tools/delay-detector/README.md Co-authored-by: Michal Nazarewicz <mina86@mina86.com>
The module has been introduced in commit cbcf678: ‘Cryptographic code for randomness beacon’ and then never used. Get rid of it.
Update near logo in README.md #7875
List of peers wasn't printed if we were in sync mode (especially during header/state sync)
EpochSync was never implemented, there is just a bunch of stubs left here and there. Removing them.
The concrete implementation wrapping ClientActor and ViewClientActor has been moved to near_client crate. Network(View)ClientMessage will be moved to near_client crate in a separate PR.
this will print more easy to understand info on which source chain transactions are making it into the target chain. for now we just log them to debug logs but it would be nice to have some HTTP debug page that shows an easy to understand summary
The TxStatusError::InvalidTx variant is never constructed so get rid of it.
Having conversion from near_chain_primitves::Error to TxStatusError eliminates a handful of trivial map_error calls.
Add a `cold_store` Cargo feature which enables the option to configure the node with cold storage. At the moment, all this does is open the cold database and doesn’t enable any other features. The idea is that this can now allow experimenting with code that needs access to the cold storage.
These actix messages are an implementation detail of near_client crate.
Fixes some minor grammar issues from #7918.
…k processing to client (#7898) This PR is a pure refactoring. The context is that any processing details should be put in Client instead of ClientActor. ClientActor should just serve as a coordinator class to handle messages and check triggers and immediately pass it to Client. This is better for testing since we can't write unit test for any logic in ClientActor and also better for code readability as the logic is not scattered in two classes. This PR only moves the part around block processing. The rest is tracked by #7899
* New last-blocks debug page. * Use JSX with babel * Minor fix * Minor fix 2 * Rename is_block_missing
- links for gas_param sections in summary didn't work - rename gas_param to just gas - pin link to commit - also mention that we spend gas on tx other than wasm
Also removed some actix messages which are not needed any more.
nagisa
approved these changes
Nov 8, 2022
pompon0
reviewed
Nov 8, 2022
pompon0
reviewed
Nov 8, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 9, 2022
if proto_msg.trace_context.is_some() { | ||
let propagator = NodePropagator::new(); | ||
if let Ok(extracted_span_context) = | ||
propagator.extract_span_context(&proto_msg.trace_context) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't the parsing error be logged here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worried about spamming the logs.
I wish I had LOG_EVERY_N_SECONDS()
.
pompon0
reviewed
Nov 9, 2022
pompon0
reviewed
Nov 11, 2022
pompon0
reviewed
Nov 11, 2022
pompon0
approved these changes
Nov 11, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Serialize TraceId and SpanId to a new field of
PeerMessage
. This lets the receiving node link traces to a trace that generated the network request.TextMapPropagator
is the interface designed to solve a similar problem, but given that:... following the
TextMapPropagator
interface doesn't add value. Setting a global text map propagator doesn't add value for the same reason.Because we want the inter-process tracing to be enabled at the debug level:
SendMessage
to enable tracing at the debug levelPeerManagerActor
tracing needs to be enabled at the debug level.https://pagodaplatform.atlassian.net/browse/ND-172