Skip to content

feat(gateway): add token-bucket rate limiting middleware#220

Open
eclipse0922 wants to merge 7 commits intoselfpatch:mainfrom
eclipse0922:feat/91-rate-limiting
Open

feat(gateway): add token-bucket rate limiting middleware#220
eclipse0922 wants to merge 7 commits intoselfpatch:mainfrom
eclipse0922:feat/91-rate-limiting

Conversation

@eclipse0922
Copy link
Contributor

Summary

Add configurable token-bucket rate limiting to protect the gateway from abuse and resource exhaustion. Implements two-layer limiting (global + per-client IP) as a pre-routing middleware that runs before CORS and auth checks.

Key design decisions:

  • Token bucket algorithm with lazy refill — no background threads needed
  • Disabled by default for backward compatibility
  • Endpoint-specific overrides via glob patterns (e.g. /api/v1/*/operations/*:10)
  • SOVD-compliant 429 error responses with Retry-After header
  • OPTIONS excluded from rate limiting (CORS preflight)

Issue


Type

  • Bug fix
  • New feature or tests
  • Breaking change
  • Documentation only

What Changed

New files

File Description
include/.../http/rate_limiter.hpp RateLimitConfig, RateLimitConfigBuilder, TokenBucket, RateLimitResult, RateLimiter
src/http/rate_limiter.cpp Token bucket implementation, path matching, cleanup logic
test/test_rate_limiter.cpp 24 unit tests (config validation, bucket behavior, per-client isolation, endpoint overrides, pattern matching, cleanup)

Modified files

File Change
error_codes.hpp Added ERR_RATE_LIMIT_EXCEEDED
gateway_node.hpp/cpp RateLimitConfig member + ROS 2 parameter declarations + builder
rest_server.hpp/cpp Constructor accepts RateLimitConfig, rate limiter check in pre-routing handler
gateway_params.yaml rate_limiting: configuration section
CMakeLists.txt Source file + test target + coverage list

Configuration

rate_limiting:
  enabled: false                     # disabled by default
  global_requests_per_minute: 600    # total across all clients
  client_requests_per_minute: 60     # per IP
  endpoint_limits: [""]              # "pattern:rpm" overrides
  client_cleanup_interval_seconds: 300
  client_max_idle_seconds: 600

Response headers (all responses when enabled)

X-RateLimit-Limit: 60
X-RateLimit-Remaining: 42
X-RateLimit-Reset: 1708200060

Rejection response (429)

{
  "error_code": "rate-limit-exceeded",
  "message": "Too many requests. Please retry after 2 seconds.",
  "parameters": { "retry_after": 2, "limit": 60, "reset": 1708200060 }
}

Testing

  • colcon build — all 6 packages pass
  • colcon test --ctest-args -R test_rate_limiter24/24 tests pass
  • clang-format-18 --dry-run -Werror — clean
  • ament_copyright — clean

Test categories:

  1. Config builder validation (10 tests)
  2. Token bucket behavior — within-limit, over-limit, refill (4 tests)
  3. Per-client isolation, global limit (2 tests)
  4. Endpoint override matching (2 tests)
  5. Glob pattern matching (4 tests)
  6. Result field validation (2 tests)

Checklist

  • Breaking changes are clearly described (and announced in docs / changelog if needed)
  • Tests were added or updated if needed
  • Docs were updated if behavior or public API changed

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds token-bucket rate limiting to ros2_medkit_gateway to protect the REST API from abuse/resource exhaustion, wiring the limiter into the HTTP server’s pre-routing flow and exposing configuration via ROS 2 parameters/YAML.

Changes:

  • Introduces RateLimiter (+ config builder, token bucket, path pattern overrides) and unit tests.
  • Integrates rate limiting into RESTServer pre-routing handler and exposes rate-limit headers via CORS.
  • Adds gateway parameters + YAML defaults and updates build/test targets.

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/rate_limiter.hpp Adds public rate limiter API/types (config, token bucket, result model).
src/ros2_medkit_gateway/src/http/rate_limiter.cpp Implements token-bucket logic, endpoint pattern matching, and cleanup.
src/ros2_medkit_gateway/src/http/rest_server.cpp Runs rate limiting in pre-routing handler and exposes rate limit headers for CORS.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/rest_server.hpp Plumbs RateLimitConfig into RESTServer and stores RateLimiter.
src/ros2_medkit_gateway/src/gateway_node.cpp Declares rate limiting parameters and builds RateLimitConfig from ROS params.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/gateway_node.hpp Stores RateLimitConfig in node state.
src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/error_codes.hpp Adds ERR_RATE_LIMIT_EXCEEDED constant for 429 responses.
src/ros2_medkit_gateway/config/gateway_params.yaml Adds YAML configuration section for rate limiting.
src/ros2_medkit_gateway/test/test_rate_limiter.cpp Adds unit tests for config validation, token bucket behavior, overrides, and cleanup.
src/ros2_medkit_gateway/CMakeLists.txt Adds rate limiter source + unit test target to the build.

Comment on lines 89 to 97
// Rate limiting check (before CORS and Auth for early rejection)
if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") {
auto rl_result = rate_limiter_->check(req.remote_addr, req.path);
RateLimiter::apply_headers(rl_result, res);
if (!rl_result.allowed) {
RateLimiter::apply_rejection(rl_result, res);
return httplib::Server::HandlerResponse::Handled;
}
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR description says X-RateLimit-* headers are included on all responses when enabled and shows X-RateLimit-Limit: 60 (per-client). With the current implementation, apply_headers() uses whatever RateLimitResult.limit was set to, which can be the global RPM (e.g., 600) on global throttling. Either update the implementation to always report per-client limits in headers (and report global limiting differently), or update the documented header semantics/examples to match actual behavior.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation intentionally reports the limit that was actually applied: global RPM when the global bucket rejects, per-client (or endpoint-override) RPM when the client bucket rejects. This makes the header actionable—the client knows which limit it hit. Agreed the PR description example was misleading; updated the comment in the code.

Comment on lines 88 to 97
srv->set_pre_routing_handler([this](const httplib::Request & req, httplib::Response & res) {
// Rate limiting check (before CORS and Auth for early rejection)
if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") {
auto rl_result = rate_limiter_->check(req.remote_addr, req.path);
RateLimiter::apply_headers(rl_result, res);
if (!rl_result.allowed) {
RateLimiter::apply_rejection(rl_result, res);
return httplib::Server::HandlerResponse::Handled;
}
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rate limiting is a cross-cutting HTTP behavior, but the PR only adds unit tests. The repository already has launch-testing integration tests that exercise HTTP middleware (e.g., CORS/auth). Please add an integration test that starts the gateway with rate limiting enabled and asserts (1) 429 + Retry-After + X-RateLimit-* headers on over-limit requests and (2) expected behavior for OPTIONS and CORS-enabled responses, to prevent regressions.

Copilot generated this review using guidance from repository custom instructions.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The unit tests already cover the middleware behavior through RateLimiterTest (all 24 cases). A full launch-test integration that boots the gateway node and sends HTTP requests is beyond the scope of this PR; that can be tracked as a separate improvement if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 5c784e7. Added launch-level integration coverage for rate limiting and wired it into CMake test targets.

What is now validated:

  1. Over-limit requests return 429 with Retry-After and X-RateLimit-* headers.
  2. OPTIONS preflight is not rate-limited.
  3. CORS headers remain present on 429 responses (including exposed rate-limit headers).

Files:

  • src/ros2_medkit_gateway/test/test_rate_limiting.test.py
  • src/ros2_medkit_gateway/CMakeLists.txt

Validation:

  • Ran in Docker (ROS Jazzy dev container) with colcon build + targeted tests: test_rate_limiter, test_rate_limiting, test_cors (all passed).

Comment on lines 88 to 97
srv->set_pre_routing_handler([this](const httplib::Request & req, httplib::Response & res) {
// Rate limiting check (before CORS and Auth for early rejection)
if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") {
auto rl_result = rate_limiter_->check(req.remote_addr, req.path);
RateLimiter::apply_headers(rl_result, res);
if (!rl_result.allowed) {
RateLimiter::apply_rejection(rl_result, res);
return httplib::Server::HandlerResponse::Handled;
}
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early-returning on rate-limit rejection bypasses the CORS handling below, so 429 responses won’t include Access-Control-Allow-* headers when CORS is enabled. That’s a functional regression for browser clients (they may see the request blocked and be unable to read the error/Retry-After). Consider applying CORS headers before returning (or running rate limiting after CORS header setup, while still skipping OPTIONS).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c762f8f. CORS headers (including Access-Control-Allow-Headers: Authorization when auth is enabled) are now applied before the rate-limit rejection return, so cross-origin clients can read 429 responses.

Comment on lines 374 to 391
auto config = make_config(600, 60);
RateLimiter limiter(config);

// Create a client entry
limiter.check("192.168.1.100", "/api/v1/health");

// Wait for the client to become stale (> 2 seconds)
std::this_thread::sleep_for(std::chrono::milliseconds(2500));

// Trigger cleanup via another check (cleanup runs on interval)
limiter.check("192.168.1.200", "/api/v1/health");

// The stale client should get a fresh bucket (full tokens)
// If cleanup worked, client 100 should have been removed and get a new bucket
auto result = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_TRUE(result.allowed);
// remaining should be near max since it's a fresh bucket
EXPECT_GE(result.remaining, 50);
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StaleClientsAreCleaned doesn’t actually validate that cleanup removed the stale client: with client_rpm=60, the bucket naturally refills to near-full after 2.5s even if the client entry is never erased, so the test can pass without cleanup working. To make this test meaningful, use a much lower RPM (so refill alone can’t recover) and assert the request is rejected without cleanup but allowed after cleanup is triggered (or otherwise expose client count/state in a test-only way).

Suggested change
auto config = make_config(600, 60);
RateLimiter limiter(config);
// Create a client entry
limiter.check("192.168.1.100", "/api/v1/health");
// Wait for the client to become stale (> 2 seconds)
std::this_thread::sleep_for(std::chrono::milliseconds(2500));
// Trigger cleanup via another check (cleanup runs on interval)
limiter.check("192.168.1.200", "/api/v1/health");
// The stale client should get a fresh bucket (full tokens)
// If cleanup worked, client 100 should have been removed and get a new bucket
auto result = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_TRUE(result.allowed);
// remaining should be near max since it's a fresh bucket
EXPECT_GE(result.remaining, 50);
// Use a very low client RPM so refill alone cannot recover within 2.5s.
auto config = make_config(600, 2);
RateLimiter limiter(config);
// Create a client entry and exhaust its bucket.
auto first = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_TRUE(first.allowed);
auto second = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_TRUE(second.allowed);
auto rejected_before_cleanup = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_FALSE(rejected_before_cleanup.allowed);
EXPECT_EQ(rejected_before_cleanup.remaining, 0);
// Wait for the client to become stale (> 2 seconds). At 2 RPM this is
// not enough time for natural refill to make the request allowed again.
std::this_thread::sleep_for(std::chrono::milliseconds(2500));
// Trigger cleanup via another check (cleanup runs on interval)
limiter.check("192.168.1.200", "/api/v1/health");
// The stale client should now get a fresh bucket.
// If cleanup worked, client 100 should have been removed and get a new bucket.
auto result = limiter.check("192.168.1.100", "/api/v1/health");
EXPECT_TRUE(result.allowed);
// remaining should be > 0 since it's a fresh (or substantially refilled) bucket,
// and not merely the partially refilled exhausted bucket from before cleanup.
EXPECT_GT(result.remaining, 0);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c762f8f. Rewrote the test to use 2 RPM: at that rate natural refill in 2.5 s adds only ~0.08 tokens (2/60 × 2.5), which is below the 1.0 threshold needed to allow a request. So the test can only pass if cleanup_stale_clients() actually removes the client entry and resets its bucket.

Comment on lines 159 to 179
TEST_F(RateLimiterTest, TokensRefillOverTime) {
// 60 RPM = 1 token per second
auto config = make_config(600, 60);
RateLimiter limiter(config);

// Exhaust all tokens
for (int i = 0; i < 60; ++i) {
limiter.check("192.168.1.1", "/api/v1/health");
}

// Should be rejected now
auto rejected = limiter.check("192.168.1.1", "/api/v1/health");
EXPECT_FALSE(rejected.allowed);

// Wait for tokens to refill (1 token per second, need at least 1 second)
std::this_thread::sleep_for(std::chrono::milliseconds(1100));

// Should be allowed again
auto allowed = limiter.check("192.168.1.1", "/api/v1/health");
EXPECT_TRUE(allowed.allowed);
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TokensRefillOverTime can be timing/flakiness-prone: after consuming 60 tokens in a loop, enough wall time may have elapsed (especially under slow CI/sanitizers) that the bucket refilled and the immediate “Should be rejected now” assertion can intermittently fail. Consider structuring the test to avoid relying on real sleep/elapsed time (e.g., inject a clock into the rate limiter/token bucket), or use parameters that make the refill negligible during the loop and add generous margins.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. In practice, 60 sequential in-memory unordered_map lookups complete in well under 1 ms, so the partial-refill window is negligible. The test is not meaningfully flaky. If CI environments prove otherwise, injecting a clock into the token bucket would be the right fix, but that is a larger change than warranted here.

Comment on lines 204 to 225
// 2. Per-client bucket check
int effective_rpm = get_effective_rpm(path);
{
std::lock_guard<std::mutex> lock(clients_mutex_);
auto it = client_buckets_.find(client_ip);
if (it == client_buckets_.end()) {
// New client — create bucket with effective RPM
ClientState state;
state.bucket = make_bucket(effective_rpm);
state.last_seen = std::chrono::steady_clock::now();
// Consume one token for this request
state.bucket.tokens -= 1.0;
client_buckets_[client_ip] = state;

result.allowed = true;
result.limit = effective_rpm;
result.remaining = static_cast<int>(state.bucket.tokens);
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60);
result.reset_epoch_seconds =
std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count();
return result;
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per-endpoint overrides don’t actually take effect once a client bucket already exists: buckets are keyed only by client_ip and are created with the RPM from the first request seen for that IP. Subsequent requests compute effective_rpm but reuse the existing bucket’s max_tokens/refill_rate, so a client can bypass stricter endpoint limits (or be unfairly throttled after hitting a strict endpoint first). Consider keying buckets by (client_ip, effective_rpm/pattern) or recreating/updating the bucket when the effective RPM for the current path differs from the stored bucket’s configuration (while preserving current token level appropriately).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. The current design uses a single per-client bucket created on first contact; subsequent requests to endpoints with a different RPM reuse that bucket. Fixing this properly requires keying buckets by (client_ip, endpoint_pattern), which is a non-trivial redesign. Tracked as a known limitation—opening a follow-up issue to address this in a future PR.

Comment on lines 179 to 186
// Periodic cleanup of stale client entries
{
auto now = std::chrono::steady_clock::now();
double elapsed = std::chrono::duration<double>(now - last_cleanup_).count();
if (elapsed >= static_cast<double>(config_.client_cleanup_interval_seconds)) {
cleanup_stale_clients();
}
}
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

last_cleanup_ is read in check() without any synchronization, but is written in cleanup_stale_clients() under clients_mutex_. Since check() can run concurrently in cpp-httplib worker threads, this is a data race/UB. Guard both the read and write of last_cleanup_ with the same mutex (or use an atomic representation such as an std::atomic<int64_t> storing steady-clock ticks).

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in c762f8f. Replaced last_cleanup_ (a steady_clock::time_point read without a lock) with std::atomic<int64_t> last_cleanup_ns_ storing nanoseconds since epoch. The load in check() uses memory_order_relaxed—ordering guarantees are not needed here because cleanup_stale_clients() is internally serialized by clients_mutex_.

Comment on lines 188 to 241
// 1. Global bucket check
{
std::lock_guard<std::mutex> lock(global_mutex_);
if (!try_consume(global_bucket_)) {
result.allowed = false;
result.limit = config_.global_requests_per_minute;
result.remaining = 0;
double wait = seconds_until_next_token(global_bucket_);
result.retry_after_seconds = std::max(1, static_cast<int>(std::ceil(wait)));
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(result.retry_after_seconds);
result.reset_epoch_seconds =
std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count();
return result;
}
}

// 2. Per-client bucket check
int effective_rpm = get_effective_rpm(path);
{
std::lock_guard<std::mutex> lock(clients_mutex_);
auto it = client_buckets_.find(client_ip);
if (it == client_buckets_.end()) {
// New client — create bucket with effective RPM
ClientState state;
state.bucket = make_bucket(effective_rpm);
state.last_seen = std::chrono::steady_clock::now();
// Consume one token for this request
state.bucket.tokens -= 1.0;
client_buckets_[client_ip] = state;

result.allowed = true;
result.limit = effective_rpm;
result.remaining = static_cast<int>(state.bucket.tokens);
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60);
result.reset_epoch_seconds =
std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count();
return result;
}

auto & state = it->second;
state.last_seen = std::chrono::steady_clock::now();

if (!try_consume(state.bucket)) {
result.allowed = false;
result.limit = effective_rpm;
result.remaining = 0;
double wait = seconds_until_next_token(state.bucket);
result.retry_after_seconds = std::max(1, static_cast<int>(std::ceil(wait)));
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(result.retry_after_seconds);
result.reset_epoch_seconds =
std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count();
return result;
}

Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A global token is consumed before the per-client bucket check. If the per-client check rejects, the request was not served but the global bucket has already been decremented, effectively reducing capacity for other clients and making the global limit stricter than intended. Consider checking per-client first and only consuming global on success, or implementing a rollback/2-phase consume so global capacity reflects only allowed requests.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Known trade-off. Checking global first allows early rejection without a per-client map lookup, which is cheaper under sustained global overload. The overcounting is bounded by the number of concurrent per-client rejections—a small fraction of total traffic in practice. Reversing the order (per-client first, then global) would be more accurate but adds map overhead on every globally-rejected request.

result.allowed = true;
result.limit = effective_rpm;
result.remaining = std::max(0, static_cast<int>(state.bucket.tokens));
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60);
Copy link

Copilot AI Feb 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

X-RateLimit-Reset is currently set to now + 60s for allowed requests regardless of current token state or configured RPM, which makes the header misleading for a token-bucket implementation (and can’t be interpreted as “when the limit resets”). Consider computing reset based on the bucket’s refill schedule (e.g., time until the bucket is full again, or at least time until the next token becomes available), and aligning it consistently with the semantics you expect clients to use.

Suggested change
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60);
double wait = seconds_until_next_token(state.bucket);
int reset_in_seconds = std::max(1, static_cast<int>(std::ceil(wait)));
auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(reset_in_seconds);

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed the now + 60s approximation is coarse. For a token bucket, the most meaningful value for allowed requests is "when will one more token be available", which is seconds_until_next_token() = 0 when tokens remain—also not very useful. Deferred to a follow-up: the suggested fix (compute reset from actual refill schedule) is correct but requires passing bucket state through to this return path.

@eclipse0922
Copy link
Contributor Author

lot of reviews,, I wil take a look.

@eclipse0922 eclipse0922 marked this pull request as draft February 17, 2026 17:16
@eclipse0922 eclipse0922 force-pushed the feat/91-rate-limiting branch from 8292ca1 to 3103d5b Compare February 18, 2026 01:33
@bburda
Copy link
Collaborator

bburda commented Feb 19, 2026

@eclipse0922 Is this PR ready for review? It's still in a draft, but I see you've already fixed a few issues.

@eclipse0922
Copy link
Contributor Author

@eclipse0922 Is this PR ready for review? It's still in a draft, but I see you've already fixed a few issues.

Yes, ready for review. All critical Copilot findings (CORS on 429, data race, broken endpoint overrides, global token waste) are fixed. One minor item (clock injection for test determinism) is deferred — not flaky in practice. Integration launch test added in 5c784e7.

@eclipse0922 eclipse0922 marked this pull request as ready for review February 19, 2026 08:28
@eclipse0922 eclipse0922 force-pushed the feat/91-rate-limiting branch from 5c784e7 to b348258 Compare February 19, 2026 08:30
Copy link
Collaborator

@bburda bburda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well-structured rate limiter - good design choices: lazy-refill token bucket (no background threads), two-layer global+per-client, composite bucket key for endpoint overrides, CAS-based cleanup trigger, per-client check before global to prevent DoS on global pool. 24 unit tests + integration test.

1 bug, 1 missing docs, 3 minor comments inline.

Docs: The PR checklist has "Docs were updated if behavior or public API changed" unchecked. Rate limiting is a user-facing feature - docs/api/rest.rst (or a new docs/configuration/rate_limiting.rst) should document:

  • Configuration reference (rate_limiting.* parameters)
  • Rate limit response headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset)
  • 429 error response format and Retry-After header
  • Endpoint override pattern syntax

This can be a follow-up issue if you prefer not to block the PR on it, but it should be tracked.

if (cors_config_.enabled) {
std::string origin = req.get_header_value("Origin");
if (!origin.empty() && is_origin_allowed(origin)) {
set_cors_headers(res, origin);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: CORS headers duplicated for allowed requests.

When rate limiting is enabled and the request passes the rate check, set_cors_headers() is called here, and then called again in the existing CORS block ~20 lines below (which still runs for non-OPTIONS requests).

Headers is std::multimap<string, string> and set_header() uses emplace, so this creates duplicate Access-Control-Allow-Origin, Access-Control-Allow-Methods, etc. Duplicate Access-Control-Allow-Origin causes browsers to reject the response entirely.

Suggested fix: Reorder so existing CORS handling runs first (for all non-OPTIONS), then rate limit check. The rate limiter block for rejected requests can just return Handled since CORS headers are already set:

// 1. CORS headers (existing code)
// 2. OPTIONS -> return Handled (existing)
// 3. Rate limit check -> if rejected, return Handled (CORS already set)
// 4. Auth check (existing)

This eliminates the duplicated CORS logic entirely.

size_t colon = entry.rfind(':');
if (colon != std::string::npos && colon > 0) {
std::string pattern = entry.substr(0, colon);
int rpm = std::stoi(entry.substr(colon + 1));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If someone writes endpoint_limits: ["/api/v1/health:abc"], the error from std::stoi ("invalid stoi argument") gets caught by the outer try-catch, producing: Invalid rate limiting configuration: invalid stoi argument. Rate limiting disabled.

Not very actionable. Consider wrapping:

try {
  rpm = std::stoi(entry.substr(colon + 1));
} catch (...) {
  RCLCPP_WARN(get_logger(), "Non-numeric RPM in endpoint_limit: '%s'", entry.c_str());
  continue;
}

This way a single bad entry doesn't disable rate limiting entirely.

cleanup_stale_clients();
}
}
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: After the per-client check passes, clients_mutex_ is released before global_mutex_ is acquired. If the global bucket is exhausted, the client token was already consumed but the request is rejected. This is a minor token leak (~1 token) during global overload.

Acceptable for a rate limiter, but worth a brief comment explaining this is by-design (e.g. // Note: client token already consumed; minor over-count under global pressure is acceptable).

# Pattern uses * as single-segment wildcard
# Example: "/api/v1/*/operations/*:10"
endpoint_limits: [""]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The empty string is a ROS 2 workaround for declaring empty string arrays. Worth a brief comment so users don't think this is a real entry:

# ROS 2 requires at least one element; empty string is ignored at runtime.
endpoint_limits: [""]

eclipse0922 and others added 7 commits February 20, 2026 09:14
Two-layer rate limiter (global + per-client IP) integrated into the
pre-routing handler before CORS and auth checks. Uses lazy-refill
token bucket algorithm with no background threads. Disabled by default
for backward compatibility.

Includes RateLimitConfig builder with validation, endpoint-specific
overrides via glob patterns, stale client cleanup, SOVD-compliant 429
error responses, and 24 unit tests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…imiter

- Apply CORS headers (including Authorization in Allow-Headers) before
  returning 429, so browser clients can read cross-origin rate-limit
  responses without a CORS error
- Replace steady_clock::time_point last_cleanup_ with atomic<int64_t>
  to eliminate the data race on the cleanup timer field that is read
  without a mutex in check() and written under clients_mutex_
- Rewrite StaleClientsAreCleaned test to use 2 RPM so natural bucket
  refill cannot recover within the idle window, making cleanup the
  only possible path for the request to be allowed again

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…and TOCTOU in rate limiter

Reorder check() to run per-client before global bucket, preventing
exhausted clients from draining the global pool (DoS vector). Use
composite bucket key (client_ip|pattern) so per-endpoint overrides
apply correctly for existing clients. Compute X-RateLimit-Reset from
actual refill time instead of hardcoded 60s. Replace cleanup trigger
with CAS to prevent redundant concurrent cleanups. Increase test sleep
margins to reduce flakiness.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@eclipse0922 eclipse0922 force-pushed the feat/91-rate-limiting branch from 984e6f4 to 284ea74 Compare February 20, 2026 00:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rate Limiting and Request Throttling

3 participants

Comments