Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Too many API requests for outdated routes in the network #673

Closed
apricote opened this issue Jul 2, 2024 · 0 comments · Fixed by #675
Closed

Too many API requests for outdated routes in the network #673

apricote opened this issue Jul 2, 2024 · 0 comments · Fixed by #675
Assignees
Labels
bug Something isn't working

Comments

@apricote
Copy link
Member

apricote commented Jul 2, 2024

TL;DR

A lot of calls to client.Server.All() are being made when the network has a lot of "outdated" routes for IPs that no longer belong to an active server.

Expected behavior

HCCM should not spam the API in reasonable situation.

Observed behavior

In RouteController.ListRoutes() we try to match every route in the network to the correct Kubernetes node.

For this, we check if the Gateway IP of the route matches the known internal IP of the any server in the network.

To make this more efficient, there was a cache introduced that provides access to the server list. This works great when the server is actually in the cache.

If no server can be found in the cache, we assume that our cache is outdated and refresh the cache with the API.

This refresh is repeated once for every route that has no matching server.

Minimal working example

  • Add x bogus routes in the Cluster CIDR to the Network before starting HCCM.
  • Observe x calls to GET /v1/servers

Log output

No response

Additional information

No response

@apricote apricote added the bug Something isn't working label Jul 2, 2024
@apricote apricote self-assigned this Jul 2, 2024
apricote added a commit that referenced this issue Jul 2, 2024
Fixes #673

In `routes.ListRoutes()` we have to find the matching server/node for
every route in the network. We find the server by utilizing a cache that
maps every private IP to the corresponding server.

This cache has a feature that refreshes the list of servers if an entry
can not be found. This is sensible, as the server might have been just
created. This is also fatal, as this refresh happens for every single
cache access. If there are a lot of routes in the network that do not
belong to any server, we refresh the cache many times for each
`ListRoutes()`. This is even more serious, as `ListRoutes()` is being
called every 10-30 seconds (see #395).

This commit introduces a `rate.Limiter` in the `AllServersCache` which
only allows the refresh to happen every 30 seconds.
apricote added a commit that referenced this issue Jul 5, 2024
Fixes #673

In `routes.ListRoutes()` we have to find the matching server/node for
every route in the network. We find the server by utilizing a cache that
maps every private IP to the corresponding server.

This cache has a feature that refreshes the list of servers if an entry
can not be found. This is sensible, as the server might have been just
created. This is also fatal, as this refresh happens for every single
cache access. If there are a lot of routes in the network that do not
belong to any server, we refresh the cache many times for each
`ListRoutes()`. This is even more serious, as `ListRoutes()` is being
called every 10-30 seconds (see #395).

This commit introduces a `rate.Limiter` in the `AllServersCache` which
only allows the refresh to happen every 30 seconds.
jooola pushed a commit that referenced this issue Jul 5, 2024
Fixes #673

In `routes.ListRoutes()` we have to find the matching server/node for
every route in the network. We find the server by utilizing a cache that
maps every private IP to the corresponding server.

This cache has a feature that refreshes the list of servers if an entry
can not be found. This is sensible, as the server might have been just
created. This is also fatal, as this refresh happens for every single
cache access. If there are a lot of routes in the network that do not
belong to any server, we refresh the cache many times for each
`ListRoutes()`. This is even more serious, as `ListRoutes()` is being
called every 10-30 seconds (see #395).

This commit introduces a `rate.Limiter` in the `AllServersCache` which
only allows the refresh to happen every 30 seconds.
jooola pushed a commit that referenced this issue Jul 5, 2024
Fixes #673

In `routes.ListRoutes()` we have to find the matching server/node for
every route in the network. We find the server by utilizing a cache that
maps every private IP to the corresponding server.

This cache has a feature that refreshes the list of servers if an entry
can not be found. This is sensible, as the server might have been just
created. This is also fatal, as this refresh happens for every single
cache access. If there are a lot of routes in the network that do not
belong to any server, we refresh the cache many times for each
`ListRoutes()`. This is even more serious, as `ListRoutes()` is being
called every 10-30 seconds (see #395).

This commit introduces a `rate.Limiter` in the `AllServersCache` which
only allows the refresh to happen every 30 seconds.
apricote added a commit that referenced this issue Jul 5, 2024
Fixes #673

In `routes.ListRoutes()` we have to find the matching server/node for
every route in the network. We find the server by utilizing a cache that
maps every private IP to the corresponding server.

This cache has a feature that refreshes the list of servers if an entry
can not be found. This is sensible, as the server might have been just
created. This is also fatal, as this refresh happens for every single
cache access. If there are a lot of routes in the network that do not
belong to any server, we refresh the cache many times for each
`ListRoutes()`. This is even more serious, as `ListRoutes()` is being
called every 10-30 seconds (see #395).

This commit introduces a `rate.Limiter` in the `AllServersCache` which
only allows the refresh to happen every 30 seconds.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
1 participant