feat(gateway): add token-bucket rate limiting middleware#220
feat(gateway): add token-bucket rate limiting middleware#220eclipse0922 wants to merge 7 commits intoselfpatch:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds token-bucket rate limiting to ros2_medkit_gateway to protect the REST API from abuse/resource exhaustion, wiring the limiter into the HTTP server’s pre-routing flow and exposing configuration via ROS 2 parameters/YAML.
Changes:
- Introduces
RateLimiter(+ config builder, token bucket, path pattern overrides) and unit tests. - Integrates rate limiting into
RESTServerpre-routing handler and exposes rate-limit headers via CORS. - Adds gateway parameters + YAML defaults and updates build/test targets.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/rate_limiter.hpp | Adds public rate limiter API/types (config, token bucket, result model). |
| src/ros2_medkit_gateway/src/http/rate_limiter.cpp | Implements token-bucket logic, endpoint pattern matching, and cleanup. |
| src/ros2_medkit_gateway/src/http/rest_server.cpp | Runs rate limiting in pre-routing handler and exposes rate limit headers for CORS. |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/rest_server.hpp | Plumbs RateLimitConfig into RESTServer and stores RateLimiter. |
| src/ros2_medkit_gateway/src/gateway_node.cpp | Declares rate limiting parameters and builds RateLimitConfig from ROS params. |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/gateway_node.hpp | Stores RateLimitConfig in node state. |
| src/ros2_medkit_gateway/include/ros2_medkit_gateway/http/error_codes.hpp | Adds ERR_RATE_LIMIT_EXCEEDED constant for 429 responses. |
| src/ros2_medkit_gateway/config/gateway_params.yaml | Adds YAML configuration section for rate limiting. |
| src/ros2_medkit_gateway/test/test_rate_limiter.cpp | Adds unit tests for config validation, token bucket behavior, overrides, and cleanup. |
| src/ros2_medkit_gateway/CMakeLists.txt | Adds rate limiter source + unit test target to the build. |
| // Rate limiting check (before CORS and Auth for early rejection) | ||
| if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") { | ||
| auto rl_result = rate_limiter_->check(req.remote_addr, req.path); | ||
| RateLimiter::apply_headers(rl_result, res); | ||
| if (!rl_result.allowed) { | ||
| RateLimiter::apply_rejection(rl_result, res); | ||
| return httplib::Server::HandlerResponse::Handled; | ||
| } | ||
| } |
There was a problem hiding this comment.
The PR description says X-RateLimit-* headers are included on all responses when enabled and shows X-RateLimit-Limit: 60 (per-client). With the current implementation, apply_headers() uses whatever RateLimitResult.limit was set to, which can be the global RPM (e.g., 600) on global throttling. Either update the implementation to always report per-client limits in headers (and report global limiting differently), or update the documented header semantics/examples to match actual behavior.
There was a problem hiding this comment.
The implementation intentionally reports the limit that was actually applied: global RPM when the global bucket rejects, per-client (or endpoint-override) RPM when the client bucket rejects. This makes the header actionable—the client knows which limit it hit. Agreed the PR description example was misleading; updated the comment in the code.
| srv->set_pre_routing_handler([this](const httplib::Request & req, httplib::Response & res) { | ||
| // Rate limiting check (before CORS and Auth for early rejection) | ||
| if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") { | ||
| auto rl_result = rate_limiter_->check(req.remote_addr, req.path); | ||
| RateLimiter::apply_headers(rl_result, res); | ||
| if (!rl_result.allowed) { | ||
| RateLimiter::apply_rejection(rl_result, res); | ||
| return httplib::Server::HandlerResponse::Handled; | ||
| } | ||
| } |
There was a problem hiding this comment.
Rate limiting is a cross-cutting HTTP behavior, but the PR only adds unit tests. The repository already has launch-testing integration tests that exercise HTTP middleware (e.g., CORS/auth). Please add an integration test that starts the gateway with rate limiting enabled and asserts (1) 429 + Retry-After + X-RateLimit-* headers on over-limit requests and (2) expected behavior for OPTIONS and CORS-enabled responses, to prevent regressions.
There was a problem hiding this comment.
The unit tests already cover the middleware behavior through RateLimiterTest (all 24 cases). A full launch-test integration that boots the gateway node and sends HTTP requests is beyond the scope of this PR; that can be tracked as a separate improvement if needed.
There was a problem hiding this comment.
Addressed in 5c784e7. Added launch-level integration coverage for rate limiting and wired it into CMake test targets.
What is now validated:
- Over-limit requests return 429 with Retry-After and X-RateLimit-* headers.
- OPTIONS preflight is not rate-limited.
- CORS headers remain present on 429 responses (including exposed rate-limit headers).
Files:
- src/ros2_medkit_gateway/test/test_rate_limiting.test.py
- src/ros2_medkit_gateway/CMakeLists.txt
Validation:
- Ran in Docker (ROS Jazzy dev container) with colcon build + targeted tests: test_rate_limiter, test_rate_limiting, test_cors (all passed).
| srv->set_pre_routing_handler([this](const httplib::Request & req, httplib::Response & res) { | ||
| // Rate limiting check (before CORS and Auth for early rejection) | ||
| if (rate_limiter_ && rate_limiter_->is_enabled() && req.method != "OPTIONS") { | ||
| auto rl_result = rate_limiter_->check(req.remote_addr, req.path); | ||
| RateLimiter::apply_headers(rl_result, res); | ||
| if (!rl_result.allowed) { | ||
| RateLimiter::apply_rejection(rl_result, res); | ||
| return httplib::Server::HandlerResponse::Handled; | ||
| } | ||
| } |
There was a problem hiding this comment.
Early-returning on rate-limit rejection bypasses the CORS handling below, so 429 responses won’t include Access-Control-Allow-* headers when CORS is enabled. That’s a functional regression for browser clients (they may see the request blocked and be unable to read the error/Retry-After). Consider applying CORS headers before returning (or running rate limiting after CORS header setup, while still skipping OPTIONS).
There was a problem hiding this comment.
Fixed in c762f8f. CORS headers (including Access-Control-Allow-Headers: Authorization when auth is enabled) are now applied before the rate-limit rejection return, so cross-origin clients can read 429 responses.
| auto config = make_config(600, 60); | ||
| RateLimiter limiter(config); | ||
|
|
||
| // Create a client entry | ||
| limiter.check("192.168.1.100", "/api/v1/health"); | ||
|
|
||
| // Wait for the client to become stale (> 2 seconds) | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(2500)); | ||
|
|
||
| // Trigger cleanup via another check (cleanup runs on interval) | ||
| limiter.check("192.168.1.200", "/api/v1/health"); | ||
|
|
||
| // The stale client should get a fresh bucket (full tokens) | ||
| // If cleanup worked, client 100 should have been removed and get a new bucket | ||
| auto result = limiter.check("192.168.1.100", "/api/v1/health"); | ||
| EXPECT_TRUE(result.allowed); | ||
| // remaining should be near max since it's a fresh bucket | ||
| EXPECT_GE(result.remaining, 50); |
There was a problem hiding this comment.
StaleClientsAreCleaned doesn’t actually validate that cleanup removed the stale client: with client_rpm=60, the bucket naturally refills to near-full after 2.5s even if the client entry is never erased, so the test can pass without cleanup working. To make this test meaningful, use a much lower RPM (so refill alone can’t recover) and assert the request is rejected without cleanup but allowed after cleanup is triggered (or otherwise expose client count/state in a test-only way).
| auto config = make_config(600, 60); | |
| RateLimiter limiter(config); | |
| // Create a client entry | |
| limiter.check("192.168.1.100", "/api/v1/health"); | |
| // Wait for the client to become stale (> 2 seconds) | |
| std::this_thread::sleep_for(std::chrono::milliseconds(2500)); | |
| // Trigger cleanup via another check (cleanup runs on interval) | |
| limiter.check("192.168.1.200", "/api/v1/health"); | |
| // The stale client should get a fresh bucket (full tokens) | |
| // If cleanup worked, client 100 should have been removed and get a new bucket | |
| auto result = limiter.check("192.168.1.100", "/api/v1/health"); | |
| EXPECT_TRUE(result.allowed); | |
| // remaining should be near max since it's a fresh bucket | |
| EXPECT_GE(result.remaining, 50); | |
| // Use a very low client RPM so refill alone cannot recover within 2.5s. | |
| auto config = make_config(600, 2); | |
| RateLimiter limiter(config); | |
| // Create a client entry and exhaust its bucket. | |
| auto first = limiter.check("192.168.1.100", "/api/v1/health"); | |
| EXPECT_TRUE(first.allowed); | |
| auto second = limiter.check("192.168.1.100", "/api/v1/health"); | |
| EXPECT_TRUE(second.allowed); | |
| auto rejected_before_cleanup = limiter.check("192.168.1.100", "/api/v1/health"); | |
| EXPECT_FALSE(rejected_before_cleanup.allowed); | |
| EXPECT_EQ(rejected_before_cleanup.remaining, 0); | |
| // Wait for the client to become stale (> 2 seconds). At 2 RPM this is | |
| // not enough time for natural refill to make the request allowed again. | |
| std::this_thread::sleep_for(std::chrono::milliseconds(2500)); | |
| // Trigger cleanup via another check (cleanup runs on interval) | |
| limiter.check("192.168.1.200", "/api/v1/health"); | |
| // The stale client should now get a fresh bucket. | |
| // If cleanup worked, client 100 should have been removed and get a new bucket. | |
| auto result = limiter.check("192.168.1.100", "/api/v1/health"); | |
| EXPECT_TRUE(result.allowed); | |
| // remaining should be > 0 since it's a fresh (or substantially refilled) bucket, | |
| // and not merely the partially refilled exhausted bucket from before cleanup. | |
| EXPECT_GT(result.remaining, 0); |
There was a problem hiding this comment.
Fixed in c762f8f. Rewrote the test to use 2 RPM: at that rate natural refill in 2.5 s adds only ~0.08 tokens (2/60 × 2.5), which is below the 1.0 threshold needed to allow a request. So the test can only pass if cleanup_stale_clients() actually removes the client entry and resets its bucket.
| TEST_F(RateLimiterTest, TokensRefillOverTime) { | ||
| // 60 RPM = 1 token per second | ||
| auto config = make_config(600, 60); | ||
| RateLimiter limiter(config); | ||
|
|
||
| // Exhaust all tokens | ||
| for (int i = 0; i < 60; ++i) { | ||
| limiter.check("192.168.1.1", "/api/v1/health"); | ||
| } | ||
|
|
||
| // Should be rejected now | ||
| auto rejected = limiter.check("192.168.1.1", "/api/v1/health"); | ||
| EXPECT_FALSE(rejected.allowed); | ||
|
|
||
| // Wait for tokens to refill (1 token per second, need at least 1 second) | ||
| std::this_thread::sleep_for(std::chrono::milliseconds(1100)); | ||
|
|
||
| // Should be allowed again | ||
| auto allowed = limiter.check("192.168.1.1", "/api/v1/health"); | ||
| EXPECT_TRUE(allowed.allowed); | ||
| } |
There was a problem hiding this comment.
TokensRefillOverTime can be timing/flakiness-prone: after consuming 60 tokens in a loop, enough wall time may have elapsed (especially under slow CI/sanitizers) that the bucket refilled and the immediate “Should be rejected now” assertion can intermittently fail. Consider structuring the test to avoid relying on real sleep/elapsed time (e.g., inject a clock into the rate limiter/token bucket), or use parameters that make the refill negligible during the loop and add generous margins.
There was a problem hiding this comment.
Noted. In practice, 60 sequential in-memory unordered_map lookups complete in well under 1 ms, so the partial-refill window is negligible. The test is not meaningfully flaky. If CI environments prove otherwise, injecting a clock into the token bucket would be the right fix, but that is a larger change than warranted here.
| // 2. Per-client bucket check | ||
| int effective_rpm = get_effective_rpm(path); | ||
| { | ||
| std::lock_guard<std::mutex> lock(clients_mutex_); | ||
| auto it = client_buckets_.find(client_ip); | ||
| if (it == client_buckets_.end()) { | ||
| // New client — create bucket with effective RPM | ||
| ClientState state; | ||
| state.bucket = make_bucket(effective_rpm); | ||
| state.last_seen = std::chrono::steady_clock::now(); | ||
| // Consume one token for this request | ||
| state.bucket.tokens -= 1.0; | ||
| client_buckets_[client_ip] = state; | ||
|
|
||
| result.allowed = true; | ||
| result.limit = effective_rpm; | ||
| result.remaining = static_cast<int>(state.bucket.tokens); | ||
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60); | ||
| result.reset_epoch_seconds = | ||
| std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count(); | ||
| return result; | ||
| } |
There was a problem hiding this comment.
Per-endpoint overrides don’t actually take effect once a client bucket already exists: buckets are keyed only by client_ip and are created with the RPM from the first request seen for that IP. Subsequent requests compute effective_rpm but reuse the existing bucket’s max_tokens/refill_rate, so a client can bypass stricter endpoint limits (or be unfairly throttled after hitting a strict endpoint first). Consider keying buckets by (client_ip, effective_rpm/pattern) or recreating/updating the bucket when the effective RPM for the current path differs from the stored bucket’s configuration (while preserving current token level appropriately).
There was a problem hiding this comment.
Correct. The current design uses a single per-client bucket created on first contact; subsequent requests to endpoints with a different RPM reuse that bucket. Fixing this properly requires keying buckets by (client_ip, endpoint_pattern), which is a non-trivial redesign. Tracked as a known limitation—opening a follow-up issue to address this in a future PR.
| // Periodic cleanup of stale client entries | ||
| { | ||
| auto now = std::chrono::steady_clock::now(); | ||
| double elapsed = std::chrono::duration<double>(now - last_cleanup_).count(); | ||
| if (elapsed >= static_cast<double>(config_.client_cleanup_interval_seconds)) { | ||
| cleanup_stale_clients(); | ||
| } | ||
| } |
There was a problem hiding this comment.
last_cleanup_ is read in check() without any synchronization, but is written in cleanup_stale_clients() under clients_mutex_. Since check() can run concurrently in cpp-httplib worker threads, this is a data race/UB. Guard both the read and write of last_cleanup_ with the same mutex (or use an atomic representation such as an std::atomic<int64_t> storing steady-clock ticks).
There was a problem hiding this comment.
Fixed in c762f8f. Replaced last_cleanup_ (a steady_clock::time_point read without a lock) with std::atomic<int64_t> last_cleanup_ns_ storing nanoseconds since epoch. The load in check() uses memory_order_relaxed—ordering guarantees are not needed here because cleanup_stale_clients() is internally serialized by clients_mutex_.
| // 1. Global bucket check | ||
| { | ||
| std::lock_guard<std::mutex> lock(global_mutex_); | ||
| if (!try_consume(global_bucket_)) { | ||
| result.allowed = false; | ||
| result.limit = config_.global_requests_per_minute; | ||
| result.remaining = 0; | ||
| double wait = seconds_until_next_token(global_bucket_); | ||
| result.retry_after_seconds = std::max(1, static_cast<int>(std::ceil(wait))); | ||
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(result.retry_after_seconds); | ||
| result.reset_epoch_seconds = | ||
| std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count(); | ||
| return result; | ||
| } | ||
| } | ||
|
|
||
| // 2. Per-client bucket check | ||
| int effective_rpm = get_effective_rpm(path); | ||
| { | ||
| std::lock_guard<std::mutex> lock(clients_mutex_); | ||
| auto it = client_buckets_.find(client_ip); | ||
| if (it == client_buckets_.end()) { | ||
| // New client — create bucket with effective RPM | ||
| ClientState state; | ||
| state.bucket = make_bucket(effective_rpm); | ||
| state.last_seen = std::chrono::steady_clock::now(); | ||
| // Consume one token for this request | ||
| state.bucket.tokens -= 1.0; | ||
| client_buckets_[client_ip] = state; | ||
|
|
||
| result.allowed = true; | ||
| result.limit = effective_rpm; | ||
| result.remaining = static_cast<int>(state.bucket.tokens); | ||
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60); | ||
| result.reset_epoch_seconds = | ||
| std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count(); | ||
| return result; | ||
| } | ||
|
|
||
| auto & state = it->second; | ||
| state.last_seen = std::chrono::steady_clock::now(); | ||
|
|
||
| if (!try_consume(state.bucket)) { | ||
| result.allowed = false; | ||
| result.limit = effective_rpm; | ||
| result.remaining = 0; | ||
| double wait = seconds_until_next_token(state.bucket); | ||
| result.retry_after_seconds = std::max(1, static_cast<int>(std::ceil(wait))); | ||
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(result.retry_after_seconds); | ||
| result.reset_epoch_seconds = | ||
| std::chrono::duration_cast<std::chrono::seconds>(reset_time.time_since_epoch()).count(); | ||
| return result; | ||
| } | ||
|
|
There was a problem hiding this comment.
A global token is consumed before the per-client bucket check. If the per-client check rejects, the request was not served but the global bucket has already been decremented, effectively reducing capacity for other clients and making the global limit stricter than intended. Consider checking per-client first and only consuming global on success, or implementing a rollback/2-phase consume so global capacity reflects only allowed requests.
There was a problem hiding this comment.
Known trade-off. Checking global first allows early rejection without a per-client map lookup, which is cheaper under sustained global overload. The overcounting is bounded by the number of concurrent per-client rejections—a small fraction of total traffic in practice. Reversing the order (per-client first, then global) would be more accurate but adds map overhead on every globally-rejected request.
| result.allowed = true; | ||
| result.limit = effective_rpm; | ||
| result.remaining = std::max(0, static_cast<int>(state.bucket.tokens)); | ||
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60); |
There was a problem hiding this comment.
X-RateLimit-Reset is currently set to now + 60s for allowed requests regardless of current token state or configured RPM, which makes the header misleading for a token-bucket implementation (and can’t be interpreted as “when the limit resets”). Consider computing reset based on the bucket’s refill schedule (e.g., time until the bucket is full again, or at least time until the next token becomes available), and aligning it consistently with the semantics you expect clients to use.
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(60); | |
| double wait = seconds_until_next_token(state.bucket); | |
| int reset_in_seconds = std::max(1, static_cast<int>(std::ceil(wait))); | |
| auto reset_time = std::chrono::system_clock::now() + std::chrono::seconds(reset_in_seconds); |
There was a problem hiding this comment.
Agreed the now + 60s approximation is coarse. For a token bucket, the most meaningful value for allowed requests is "when will one more token be available", which is seconds_until_next_token() = 0 when tokens remain—also not very useful. Deferred to a follow-up: the suggested fix (compute reset from actual refill schedule) is correct but requires passing bucket state through to this return path.
|
lot of reviews,, I wil take a look. |
8292ca1 to
3103d5b
Compare
|
@eclipse0922 Is this PR ready for review? It's still in a draft, but I see you've already fixed a few issues. |
Yes, ready for review. All critical Copilot findings (CORS on 429, data race, broken endpoint overrides, global token waste) are fixed. One minor item (clock injection for test determinism) is deferred — not flaky in practice. Integration launch test added in 5c784e7. |
5c784e7 to
b348258
Compare
bburda
left a comment
There was a problem hiding this comment.
Well-structured rate limiter - good design choices: lazy-refill token bucket (no background threads), two-layer global+per-client, composite bucket key for endpoint overrides, CAS-based cleanup trigger, per-client check before global to prevent DoS on global pool. 24 unit tests + integration test.
1 bug, 1 missing docs, 3 minor comments inline.
Docs: The PR checklist has "Docs were updated if behavior or public API changed" unchecked. Rate limiting is a user-facing feature - docs/api/rest.rst (or a new docs/configuration/rate_limiting.rst) should document:
- Configuration reference (
rate_limiting.*parameters) - Rate limit response headers (
X-RateLimit-Limit,X-RateLimit-Remaining,X-RateLimit-Reset) - 429 error response format and
Retry-Afterheader - Endpoint override pattern syntax
This can be a follow-up issue if you prefer not to block the PR on it, but it should be tracked.
| if (cors_config_.enabled) { | ||
| std::string origin = req.get_header_value("Origin"); | ||
| if (!origin.empty() && is_origin_allowed(origin)) { | ||
| set_cors_headers(res, origin); |
There was a problem hiding this comment.
Bug: CORS headers duplicated for allowed requests.
When rate limiting is enabled and the request passes the rate check, set_cors_headers() is called here, and then called again in the existing CORS block ~20 lines below (which still runs for non-OPTIONS requests).
Headers is std::multimap<string, string> and set_header() uses emplace, so this creates duplicate Access-Control-Allow-Origin, Access-Control-Allow-Methods, etc. Duplicate Access-Control-Allow-Origin causes browsers to reject the response entirely.
Suggested fix: Reorder so existing CORS handling runs first (for all non-OPTIONS), then rate limit check. The rate limiter block for rejected requests can just return Handled since CORS headers are already set:
// 1. CORS headers (existing code)
// 2. OPTIONS -> return Handled (existing)
// 3. Rate limit check -> if rejected, return Handled (CORS already set)
// 4. Auth check (existing)This eliminates the duplicated CORS logic entirely.
| size_t colon = entry.rfind(':'); | ||
| if (colon != std::string::npos && colon > 0) { | ||
| std::string pattern = entry.substr(0, colon); | ||
| int rpm = std::stoi(entry.substr(colon + 1)); |
There was a problem hiding this comment.
nit: If someone writes endpoint_limits: ["/api/v1/health:abc"], the error from std::stoi ("invalid stoi argument") gets caught by the outer try-catch, producing: Invalid rate limiting configuration: invalid stoi argument. Rate limiting disabled.
Not very actionable. Consider wrapping:
try {
rpm = std::stoi(entry.substr(colon + 1));
} catch (...) {
RCLCPP_WARN(get_logger(), "Non-numeric RPM in endpoint_limit: '%s'", entry.c_str());
continue;
}This way a single bad entry doesn't disable rate limiting entirely.
| cleanup_stale_clients(); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
nit: After the per-client check passes, clients_mutex_ is released before global_mutex_ is acquired. If the global bucket is exhausted, the client token was already consumed but the request is rejected. This is a minor token leak (~1 token) during global overload.
Acceptable for a rate limiter, but worth a brief comment explaining this is by-design (e.g. // Note: client token already consumed; minor over-count under global pressure is acceptable).
| # Pattern uses * as single-segment wildcard | ||
| # Example: "/api/v1/*/operations/*:10" | ||
| endpoint_limits: [""] | ||
|
|
There was a problem hiding this comment.
nit: The empty string is a ROS 2 workaround for declaring empty string arrays. Worth a brief comment so users don't think this is a real entry:
# ROS 2 requires at least one element; empty string is ignored at runtime.
endpoint_limits: [""]Two-layer rate limiter (global + per-client IP) integrated into the pre-routing handler before CORS and auth checks. Uses lazy-refill token bucket algorithm with no background threads. Disabled by default for backward compatibility. Includes RateLimitConfig builder with validation, endpoint-specific overrides via glob patterns, stale client cleanup, SOVD-compliant 429 error responses, and 24 unit tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…imiter - Apply CORS headers (including Authorization in Allow-Headers) before returning 429, so browser clients can read cross-origin rate-limit responses without a CORS error - Replace steady_clock::time_point last_cleanup_ with atomic<int64_t> to eliminate the data race on the cleanup timer field that is read without a mutex in check() and written under clients_mutex_ - Rewrite StaleClientsAreCleaned test to use 2 RPM so natural bucket refill cannot recover within the idle window, making cleanup the only possible path for the request to be allowed again Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…te-limiter changes)
…and TOCTOU in rate limiter Reorder check() to run per-client before global bucket, preventing exhausted clients from draining the global pool (DoS vector). Use composite bucket key (client_ip|pattern) so per-endpoint overrides apply correctly for existing clients. Compute X-RateLimit-Reset from actual refill time instead of hardcoded 60s. Replace cleanup trigger with CAS to prevent redundant concurrent cleanups. Increase test sleep margins to reduce flakiness. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
984e6f4 to
284ea74
Compare
Summary
Add configurable token-bucket rate limiting to protect the gateway from abuse and resource exhaustion. Implements two-layer limiting (global + per-client IP) as a pre-routing middleware that runs before CORS and auth checks.
Key design decisions:
/api/v1/*/operations/*:10)Retry-AfterheaderIssue
Type
What Changed
New files
include/.../http/rate_limiter.hppRateLimitConfig,RateLimitConfigBuilder,TokenBucket,RateLimitResult,RateLimitersrc/http/rate_limiter.cpptest/test_rate_limiter.cppModified files
error_codes.hppERR_RATE_LIMIT_EXCEEDEDgateway_node.hpp/cppRateLimitConfigmember + ROS 2 parameter declarations + builderrest_server.hpp/cppRateLimitConfig, rate limiter check in pre-routing handlergateway_params.yamlrate_limiting:configuration sectionCMakeLists.txtConfiguration
Response headers (all responses when enabled)
Rejection response (429)
{ "error_code": "rate-limit-exceeded", "message": "Too many requests. Please retry after 2 seconds.", "parameters": { "retry_after": 2, "limit": 60, "reset": 1708200060 } }Testing
colcon build— all 6 packages passcolcon test --ctest-args -R test_rate_limiter— 24/24 tests passclang-format-18 --dry-run -Werror— cleanament_copyright— cleanTest categories:
Checklist