Skip to content

Conversation

@visz11
Copy link

@visz11 visz11 commented Oct 15, 2025

User description

We need to prevent this logging being too spammy.

I also removed the trace logging because it seems confusing that we'd have potentially very spammy trace logging, but limited debug logging? I don't know if it provides a lot of value in any case?

Open to applying rate limiting ONLY when trace is not enabled if we think there's value in keeping the trace messages?


CodeAnt-AI Description

Rate-limit write-load decider debug logs and remove trace messages

What Changed

  • Debug messages about the write-load decider intervening (e.g., blocking or avoiding shard moves because a node is hot) are now rate-limited so the same message is emitted at most once per interval; the default interval is 1 minute.
  • Trace-level messages from the write-load decider were removed and will no longer appear in logs.
  • The logging interval is configurable at runtime via a cluster setting, so operators can shorten or lengthen how often intervention messages appear.
  • Allocation and canRemain decisions are unchanged; only the frequency and level of emitted logs changed.

Impact

✅ Fewer debug log entries
✅ Lower log disk usage during heavy write load
✅ Reduced logging noise for hot-node interventions

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • New Features
    • Added a cluster setting to control the minimum interval between write-load decider log messages (default: 1 minute), adjustable at runtime.
    • Introduced rate-limited logging for write-load decisions to reduce log noise while preserving actionable diagnostics.
    • Enabled this setting as part of built-in cluster settings, making it easier to configure and manage across the cluster.

@codeant-ai
Copy link

codeant-ai bot commented Oct 15, 2025

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@gemini-code-assist
Copy link

Summary of Changes

Hello @visz11, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the logging behavior of the write load decider by introducing a configurable rate-limiting mechanism for debug messages and removing trace-level logging entirely. The primary goal is to reduce log verbosity and prevent log spam, thereby improving the clarity and utility of system logs without sacrificing critical debug information, which will now be presented at a controlled frequency.

Highlights

  • Rate Limiting Debug Logs: Implemented rate limiting for debug-level log messages originating from the write load decider interventions to prevent log spam.
  • New Configuration Setting: Introduced a new dynamic and node-scoped cluster setting, cluster.routing.allocation.write_load.log_interval, which defaults to 1 minute, to control the minimum time between these rate-limited log messages.
  • Removed Trace Logging: Eliminated all trace-level logging related to write load decider explanations, as it was deemed potentially spammy and of limited value.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@codeant-ai codeant-ai bot added the size:S This PR changes 10-29 lines, ignoring generated files label Oct 15, 2025
@coderabbitai
Copy link

coderabbitai bot commented Oct 15, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds a new dynamic node-scoped setting for minimum logging interval of write-load interventions, wires it into ClusterSettings, and updates WriteLoadConstraintDecider to use a rate-limited logging mechanism guarded by debug checks. Direct debug/trace logs are replaced with FrequencyCappedAction.maybeExecute calls.

Changes

Cohort / File(s) Summary
Write-load settings addition
server/src/main/java/org/elasticsearch/cluster/routing/allocation/WriteLoadConstraintSettings.java
Declares WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL (TimeValue, default 1m), dynamic, node-scoped, documented for minimum interval between intervention logs.
Rate-limited logging in decider
server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
Introduces FrequencyCappedAction logInterventionMessage; initializes and updates via ClusterSettings; replaces direct debug/trace logs in canAllocate/canRemain with guarded maybeExecute calls. No decision logic changes.
Cluster settings registry update
server/src/main/java/org/elasticsearch/common/settings/ClusterSettings.java
Adds WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL to BUILT_IN_CLUSTER_SETTINGS; no other behavior changes.

Sequence Diagram(s)

sequenceDiagram
  participant Caller as Allocation Engine
  participant Decider as WriteLoadConstraintDecider
  participant Capper as FrequencyCappedAction (rate limiter)
  participant Logger as Logger
  participant CSettings as ClusterSettings

  rect rgb(245,248,255)
  note over Decider,CSettings: Initialization and dynamic updates
  Decider->>CSettings: initializeAndWatch(min_log_interval)
  CSettings-->>Decider: Notify on setting change (dynamic)
  Decider->>Capper: Update minimum interval
  end

  rect rgb(245,255,245)
  note over Caller,Logger: Rate-limited intervention logging during decisions
  Caller->>Decider: canAllocate()/canRemain(...)
  alt debug logging enabled
    Decider->>Capper: maybeExecute(() -> Logger.debug(...))
    Capper-->>Logger: Execute if interval elapsed
    Capper-->>Decider: Suppress if within interval
  else debug disabled
    Decider-->>Caller: Skip logging
  end
  Decider-->>Caller: Return decision (unchanged)
  end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Poem

I tap my paw with patient cheer,
“Log less often, make it clear!”
A minute’s hush between each note,
Keeps our burrow’s scroll remote.
With capped chimes, the clusters sing—
Quiet logs, but same deciding. 🐇⏱️

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title succinctly describes the core enhancement of rate limiting the write load decider’s logging to prevent excessive output. It is concise, specific, and accurately reflects the primary change without extraneous detail.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch clone-rate_limit_write_load_decider_logging

Warning

Review ran into problems

🔥 Problems

Git: Failed to clone repository. Please run the @coderabbitai full review command to re-trigger a full review. If the issue persists, set path_filters to include or exclude specific files.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces rate-limiting for the write load decider's debug logging to prevent log spam, which is a sensible improvement. A new dynamic cluster setting is added to control the logging interval, with a default of one minute. The implementation uses FrequencyCappedAction correctly. I've left one suggestion to reduce code duplication by extracting the new logging logic into a helper method. The removal of the trace logging is also a reasonable simplification.

Comment on lines +88 to +90
if (logger.isDebugEnabled()) {
logInterventionMessage.maybeExecute(() -> logger.debug(explain));
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This rate-limited logging logic is duplicated in three places within this class (here, lines 110-112, and 160-162). To improve maintainability and reduce code duplication, consider extracting this logic into a private helper method.

For example, you could add:

private void maybeLogIntervention(String explanation) {
    if (logger.isDebugEnabled()) {
        logInterventionMessage.maybeExecute(() -> logger.debug(explanation));
    }
}

And then replace the duplicated blocks with a call to maybeLogIntervention(explain);.

@codeant-ai
Copy link

codeant-ai bot commented Oct 15, 2025

Pull Request Feedback 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Behavioral correctness
    The PR description indicates rate-limiting debug logs for the write-load decider. Ensure the code that enforces the minimum logging interval actually reads this setting and that the interaction with logger levels (debug vs trace) is correct. Validate there are no edge cases (e.g., clock skew, negative durations) that could cause logs to be suppressed indefinitely or spammed.

  • Setting semantics and scope
    Verify that WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL is declared in WriteLoadConstraintSettings with the correct Setting type and scope (e.g., a time setting, cluster-scoped, dynamic if intended). If the setting is not a dynamic cluster-level setting, changing it at runtime may not work as operators expect.

  • Initialization race
    The FrequencyCappedAction is constructed with TimeValue.ZERO initially and the cluster setting is applied via initializeAndWatch. If the setting is not applied synchronously, there may be a brief window where logging is not rate-limited (or conversely is unrestricted). Verify initializeAndWatch runs synchronously in this code path or initialize the rate limiter with the current configured value to avoid a race.

  • Missing test coverage
    The new cluster setting WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL was added to the built-in cluster settings list. Please add unit tests that assert the setting is present in BUILT_IN_CLUSTER_SETTINGS and that changes to the setting propagate as expected (e.g., it is recognized as a cluster-level, dynamic setting). This prevents regressions where a new setting is declared but not actually registered for cluster updates.

  • Unnecessary formatting work
    The human-readable diagnostic Strings.format(...) calls are performed unconditionally before the debug-level guard + rate-limit check. That means expensive string construction (including toHumanReadableString calls) happens even when debug logging is disabled or the frequency cap prevents emission. Move message formatting to be lazy so the work is only done when the message will actually be logged.

Comment on lines +112 to +120
/**
* The minimum amount of time between logging messages about write load decider interventions
*/
public static final Setting<TimeValue> WRITE_LOAD_DECIDER_MINIMUM_LOGGING_INTERVAL = Setting.timeSetting(
SETTING_PREFIX + "log_interval",
TimeValue.timeValueMinutes(1),
Setting.Property.Dynamic,
Setting.Property.NodeScope
);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Ensure the class API surface includes access to the new setting by exposing getMinimumLoggingInterval() and make the stored value updateable by adding an initializeAndWatch call in the constructor to update minimumLoggingInterval. [maintainability]

@codeant-ai
Copy link

codeant-ai bot commented Oct 15, 2025

CodeAnt AI finished reviewing your PR.

@visz11
Copy link
Author

visz11 commented Oct 16, 2025

/refacto-visz

@refacto-visz
Copy link

refacto-visz bot commented Oct 16, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

@visz11
Copy link
Author

visz11 commented Oct 16, 2025

/refacto-visz

@refacto-visz
Copy link

refacto-visz bot commented Oct 16, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

@visz11
Copy link
Author

visz11 commented Oct 16, 2025

/refacto-visz

@refacto-visz
Copy link

refacto-visz bot commented Oct 16, 2025

Refacto is reviewing this PR. Please wait for the review comments to be posted.

@refacto-visz
Copy link

refacto-visz bot commented Oct 16, 2025

Multi-Domain Review: Logging

👍 Well Done
Rate Limiting Implementation

FrequencyCappedAction prevents log spam while maintaining debug visibility.

📁 Selected files for review (3)
  • server/src/main/java/org/elasticsearch/cluster/routing/allocation/WriteLoadConstraintSettings.java
  • server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java
  • server/src/main/java/org/elasticsearch/common/settings/ClusterSettings.java
🎯 Custom Instructions
✅ Applied Instructions
Organization Guidelines
  • Keep pull requests small and focused (prefer < 400 lines of code).
  • All CI/CD checks, linting, and unit tests must pass before merge.
  • Use feature flags for new functionality and include a clear rollback plan.
  • Follow the company security checklist:
    • No hard-coded secrets or credentials.
    • Validate all external inputs.
    • Use parameterized queries for DB access.

Scope: All files

📝 Additional Comments
server/src/main/java/org/elasticsearch/cluster/routing/allocation/decider/WriteLoadConstraintDecider.java (3)
Debug Level Check

Pattern correctly checks debug level before expensive operations. This prevents unnecessary string formatting and lambda creation when debug logging is disabled, improving performance in production.

Standards:

  • SRE Performance
  • Logging Best Practices
Trace Removal Rationale

Trace logging removal eliminates detailed diagnostic information for successful allocation decisions. Consider if this information might be valuable for troubleshooting allocation behavior in development environments.

Standards:

  • Observability Patterns
  • Debug Information Preservation
Efficient Log Filtering

Double-gated logging approach combines level checking with rate limiting efficiently. This prevents both unnecessary computation and log spam while maintaining diagnostic capability.

Standards:

  • Performance Optimization
  • Resource Efficiency


public WriteLoadConstraintDecider(ClusterSettings clusterSettings) {
this.writeLoadConstraintSettings = new WriteLoadConstraintSettings(clusterSettings);
logInterventionMessage = new FrequencyCappedAction(System::currentTimeMillis, TimeValue.ZERO);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System Clock Dependency

System::currentTimeMillis creates direct dependency on system clock which can cause issues during clock adjustments or in testing environments. Clock changes can break rate limiting behavior and make testing non-deterministic.

Standards
  • ISO-25010 Time Behaviour
  • Clean Code Testability

import org.elasticsearch.common.settings.ClusterSettings;
import org.elasticsearch.core.Strings;
import org.elasticsearch.core.TimeValue;
import org.elasticsearch.threadpool.ThreadPool;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadPool Injection Opportunity

ThreadPool is imported but not used in constructor while System::currentTimeMillis is used directly. ThreadPool.relativeTimeInMillis() would provide consistent time source and better testability.

Standards
  • SOLID Dependency Inversion
  • Clean Code Consistency

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:S This PR changes 10-29 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants