FNS polish - metrics, max msg size, default flag values #2517

jonfung-dydx · 2024-10-18T16:29:26Z

Porting over changes from https://github.com/dydxprotocol/v4-chain/commits/feat/full-node-streaming/ feature branch

metrics, remove dead code, tag some metrics by subscription id
max message size upped
default flag values upped since more traffic from subaccounts + taker orders

Summary by CodeRabbit

New Features
- Increased default gRPC streaming batch size and channel buffer size to enhance performance.
- Introduced new metrics for tracking subscription updates and removed obsolete metrics.
Improvements
- Enhanced subscription management with better validation and logging.
- Improved handling of streaming updates and metrics emission for better reliability.
Bug Fixes
- Adjusted metrics handling to ensure accurate tracking of streaming updates.
Chores
- Updated test cases to align with new configuration standards.

…ove unused code

coderabbitai · 2024-10-18T16:29:33Z

Walkthrough

This pull request introduces several changes across multiple files related to gRPC streaming configuration and metrics management. Key modifications include updating default values for gRPC streaming constants, enhancing the FullNodeStreamingManagerImpl class with improved metric handling and subscription management, and adjusting test cases to reflect these changes. Additionally, a new maximum message size is set for WebSocket connections. These updates aim to refine the functionality and maintainability of the streaming and metrics components.

Changes

File Path	Change Summary
`protocol/app/flags/flags.go`,	Updated constants `DefaultGrpcStreamingMaxBatchSize` and `DefaultGrpcStreamingMaxChannelBufferSize` from `2000` to `10000`.
`protocol/app/flags/flags_test.go`	Modified test cases in `TestValidate` and `TestGetFlagValuesFromOptions` to reflect updated gRPC streaming constants.
`protocol/lib/metrics/metric_keys.go`	Removed `GrpcSendSubaccountSnapshotLatency`, added `GrpcSendSubaccountUpdateCount` and `SubscriptionId`.
`protocol/streaming/full_node_streaming_manager.go`	Enhanced `EmitMetrics`, updated `Subscribe` method for better logging, improved update handling, and refined internal logic for metrics emission.
`protocol/streaming/ws/websocket_server.go`	Set maximum message size for WebSocket connections to 10 megabytes.

Possibly related PRs

Grpc Streaming Metrics #1616: Adds metrics for gRPC streaming, related to changes in gRPC streaming configuration constants.
GRPC Full node streaming - batching protos #1626: Introduces changes related to gRPC full node streaming, connecting to gRPC streaming adjustments.
GRPC Streaming Batching #1633: Focuses on batching protocol buffers for gRPC streaming, aligning with gRPC streaming configuration changes.
GRPC Streaming add channel buffer per subscription and metrics #1687: Adds channel buffers per subscription in gRPC streaming, relating to modifications of gRPC streaming constants.
GPRC Streaming change default flag options, break upon connection order #1776: Changes default values for gRPC streaming configuration constants, directly related to the main PR.
Bump grpc stream flag default values #2051: Modifies default values for gRPC streaming configuration constants, aligning with the main PR changes.
Full node streaming batch size reset to 2000, properly zero out cache on flush #2068: Resets default values for gRPC streaming configuration constants, directly related to the main PR.
Full Node Streaming Recurring snapshots #2079: Introduces a feature for recurring snapshots in full node streaming, connecting to gRPC streaming adjustments.

Suggested labels

protocol, indexer, proto

Suggested reviewers

jayy04
dydxwill

🐇 In the meadow, changes bloom,
Constants rise, dispelling gloom.
Metrics dance, subscriptions sing,
WebSocket limits, a new spring.
With every hop, our code does thrive,
In this patch, our dreams arrive! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

🧹 Outside diff range and nitpick comments (5)

protocol/streaming/ws/websocket_server.go (1)
59-61: Approve with suggestions: Consider making the message size limit configurable

The addition of a maximum message size limit for WebSocket connections is a good security practice. It helps prevent potential denial-of-service attacks and improves overall system stability.

However, consider the following improvements:

Make the limit configurable, perhaps through an environment variable or configuration file. This would allow for easier adjustments without code changes.

Add a comment explaining the rationale behind the 10 MB limit.

Example implementation:
const defaultMaxMessageSize = 10 * 1024 * 1024 // 10 MB

// MaxMessageSize can be set through an environment variable
var MaxMessageSize = getEnvInt("WS_MAX_MESSAGE_SIZE", defaultMaxMessageSize)

// In the Handler method:
// Set maximum message size to prevent potential DoS attacks
conn.SetReadLimit(int64(MaxMessageSize))
protocol/lib/metrics/metric_keys.go (1)

88-88: Approved: New SubscriptionId metric key

The addition of SubscriptionId as a metric key is a valuable improvement. It aligns perfectly with the PR objective of tagging metrics by subscription ID, which will enable more granular analysis of system behavior.

Consider renaming the constant to SubscriptionID (with uppercase "ID") to be consistent with common Go naming conventions for acronyms.
protocol/app/flags/flags_test.go (1)
122-124: LGTM: Consistent update of gRPC streaming parameters with optimistic execution.

The increase in GrpcStreamingMaxBatchSize and GrpcStreamingMaxChannelBufferSize from 2000 to 10000 is consistently applied here, demonstrating compatibility with optimistic execution.

There's a minor typo in the test case name: "canbe" should be "can be". Consider updating it to improve readability:
-		"success - optimistic execution canbe  enabled with gRPC streaming": {
+		"success - optimistic execution can be enabled with gRPC streaming": {
protocol/streaming/full_node_streaming_manager.go (2)

723-723: Monitor buffer state with appropriate metric emission frequency

While emitting metrics after adding updates to the cache is useful for monitoring, ensure that the frequency of sm.EmitMetrics() calls does not introduce performance overhead, especially if updates are added to the cache very frequently.

761-761: Prevent excessive metric emissions during error handling

In RemoveSubscriptionsAndClearBufferIfFull, calling sm.EmitMetrics() immediately after clearing the buffer and subscriptions may not be necessary, as the buffer is now empty. Evaluate whether this metric emission provides additional value or if it can be omitted to reduce overhead.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 66d7a5d and 030f952.

📒 Files selected for processing (5)

protocol/app/flags/flags.go (1 hunks)
protocol/app/flags/flags_test.go (4 hunks)
protocol/lib/metrics/metric_keys.go (2 hunks)
protocol/streaming/full_node_streaming_manager.go (9 hunks)
protocol/streaming/ws/websocket_server.go (1 hunks)

🧰 Additional context used

🔇 Additional comments (12)

protocol/lib/metrics/metric_keys.go (2)

76-76: Approved: Improved metric for subaccount updates

The change from GrpcSendSubaccountSnapshotLatency to GrpcSendSubaccountUpdateCount is a good improvement. This new metric will provide more actionable data on the frequency of subaccount updates, which aligns well with the PR objective of enhancing metrics. It will help in monitoring system activity more effectively, especially in light of the mentioned increase in traffic from subaccounts.

Line range hint 1-94: Summary: Effective metrics enhancements

The changes to this file effectively support the PR objectives by:

Shifting focus from latency to count for subaccount updates, providing more actionable data.

Introducing a subscription ID metric, enabling more granular analysis.

These modifications will help in better monitoring and understanding system behavior, especially in light of the increased traffic from subaccounts mentioned in the PR description.

protocol/app/flags/flags.go (3)

75-76: Summary: Increased gRPC streaming capacity

The changes to DefaultGrpcStreamingMaxBatchSize and DefaultGrpcStreamingMaxChannelBufferSize are consistent and align with the PR objectives to handle increased traffic. These modifications should improve the system's ability to handle higher loads, particularly from subaccounts and taker orders.

Key points:

Both values increased from 2000 to 10000, providing a 5x capacity increase.

The changes are isolated to default values and don't require modifications to other parts of the code.

The Validate function remains valid as it only checks for positive values.

Recommendations:

Conduct thorough performance testing to ensure the system can handle the increased capacity effectively.

Monitor system resource usage, particularly memory consumption and CPU utilization, after deployment.

Consider implementing adaptive scaling mechanisms for these values in future iterations if traffic patterns are highly variable.

76-76: Approved: Increased DefaultGrpcStreamingMaxChannelBufferSize

The increase from 2000 to 10000 for DefaultGrpcStreamingMaxChannelBufferSize is in line with the PR objectives to handle increased traffic. This change should improve the system's ability to handle bursts of messages without dropping them.

However, please consider the following:

Monitor memory usage, as larger channel buffers will increase memory consumption.

Keep an eye on message processing delays, as larger queues might lead to increased latency.

Ensure that the system can effectively process messages at this increased rate to prevent buffer overflow.

To verify the impact of this change and ensure system stability, consider running the following commands:

75-75: Approved: Increased DefaultGrpcStreamingMaxBatchSize

The increase from 2000 to 10000 for DefaultGrpcStreamingMaxBatchSize aligns with the PR objectives to handle increased traffic. This change should improve throughput for gRPC streaming.

However, please consider the following:

Monitor memory usage, as larger batches may increase memory consumption.

Keep an eye on latency for processing individual messages within a batch.

Ensure that clients can handle larger batch sizes effectively.

To verify the impact of this change, consider running the following commands:

protocol/app/flags/flags_test.go (4)

92-93: LGTM: Updated gRPC streaming parameters align with PR objectives.

The increase in GrpcStreamingMaxBatchSize and GrpcStreamingMaxChannelBufferSize from 2000 to 10000 is consistent with the PR's goal of updating default flag values to handle increased traffic from subaccounts and taker orders. This change should allow for processing larger batches and maintaining larger buffer sizes in gRPC streaming.

103-104: LGTM: Consistent update of gRPC streaming parameters.

The increase in GrpcStreamingMaxBatchSize and GrpcStreamingMaxChannelBufferSize from 2000 to 10000 is consistently applied here, maintaining parity with the gRPC-only configuration. This ensures that the updated values are used regardless of whether WebSocket streaming is enabled alongside gRPC streaming.

260-261: LGTM: Updated default values for gRPC streaming parameters.

The increase in the expected default values for GrpcStreamingMaxBatchSize and GrpcStreamingMaxChannelBufferSize from 2000 to 10000 ensures consistency with the actual implementation. This change aligns with the PR's objective of updating default flag values to handle increased traffic.

Line range hint 1-380: Summary: Consistent updates to gRPC streaming parameters in test cases.

The changes in this file consistently update the GrpcStreamingMaxBatchSize and GrpcStreamingMaxChannelBufferSize values from 2000 to 10000 across various test cases. These updates align well with the PR's objective of adjusting default flag values to handle increased traffic from subaccounts and taker orders. The changes are applied consistently in different scenarios, including gRPC-only streaming, combined gRPC and WebSocket streaming, and when optimistic execution is enabled.

A minor typo was found in one test case name, which has been noted for correction. Overall, these changes should improve the system's ability to handle larger batches and maintain larger buffer sizes in gRPC streaming, contributing to better performance under increased load.

protocol/streaming/full_node_streaming_manager.go (3)

377-381: Ensure metric increments align with monitoring objectives

In the sendStreamUpdates method, you're incrementing metrics.GrpcAddToSubscriptionChannelCount each time updates are attempted to be sent to a subscription's channel. Verify that this metric provides meaningful insights and aligns with your monitoring goals, without introducing significant overhead.

745-745: Check buffer overflow handling logic

After adding subaccount updates to the cache, you're calling sm.RemoveSubscriptionsAndClearBufferIfFull(). Verify that the conditions for buffer overflow are correctly set and that this function effectively prevents buffer overflows without unnecessarily disconnecting subscribers.

238-241: Confirm label correctness for metrics

When incrementing metrics.GrpcSendResponseToSubscriberCount, ensure that the labels applied correctly represent the subscription ID. Accurate labeling is crucial for effective metric aggregation and monitoring.

Run the following script to verify that metric labels are correctly applied:

coderabbitai · 2024-10-18T16:35:18Z