Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Periodically resync proxies to agents #18050

Merged
merged 4 commits into from
Nov 4, 2022
Merged

Conversation

rosstimothy
Copy link
Contributor

Prior to #14262, resource watchers would periodically close their watcher, create a new one and refetch the current set of resources. It turns out that the reverse tunnel subsytem relied on this behavior to periodically broadcast the list of proxies to agents during steady state. Now that watchers are persistent and no longer perform a refetch, agents that are unable to connect to a proxy expire them after a period of time, and since they never receive the periodic refresh, they never attempt to connect to said proxy again.

To remedy this, a new ticker is added to the localsite that grabs the current set of proxies from its proxy watcher and sends a discovery request to the agent. The frequency of the ticker is set to fire prior to the tracker would expire the proxy so that if a proxy exists in the cluster, then the agent will continually try to connect to it.

@rosstimothy rosstimothy force-pushed the tross/refresh_agent_proxies branch 2 times, most recently from d7dbb72 to 3523747 Compare November 2, 2022 18:42
@rosstimothy rosstimothy marked this pull request as ready for review November 2, 2022 18:42
Copy link
Contributor

@espadolini espadolini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much data is this, given the relatively inefficient marshaling of the mostly-empty ServerV2?

@rosstimothy
Copy link
Contributor Author

How much data is this, given the relatively inefficient marshaling of the mostly-empty ServerV2?

The table below shows the size of a marshaled discoveryRequest for master and the two commits in this PR:

Master 3523747 7439815d93
4766 3671 801

Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsytem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
@rosstimothy rosstimothy force-pushed the tross/refresh_agent_proxies branch from 5bb90a5 to 66e48d3 Compare November 3, 2022 19:21
@rosstimothy rosstimothy enabled auto-merge (squash) November 3, 2022 19:23
@rosstimothy rosstimothy force-pushed the tross/refresh_agent_proxies branch from b46bd52 to 66523d9 Compare November 3, 2022 21:28
@rosstimothy rosstimothy force-pushed the tross/refresh_agent_proxies branch from 15e4581 to fdde20d Compare November 3, 2022 21:43
@rosstimothy rosstimothy merged commit 3b4c144 into master Nov 4, 2022
@github-actions
Copy link

github-actions bot commented Nov 4, 2022

@rosstimothy See the table below for backport results.

Branch Result
branch/v10 Failed
branch/v11 Create PR
branch/v8 Failed
branch/v9 Failed

rosstimothy added a commit that referenced this pull request Nov 4, 2022
Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
rosstimothy added a commit that referenced this pull request Nov 4, 2022
Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
@rosstimothy rosstimothy deleted the tross/refresh_agent_proxies branch November 4, 2022 15:34
rosstimothy added a commit that referenced this pull request Nov 4, 2022
Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
rosstimothy added a commit that referenced this pull request Nov 7, 2022
* Periodically resync proxies to agents (#18050)

Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
rosstimothy added a commit that referenced this pull request Nov 7, 2022
Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
rosstimothy added a commit that referenced this pull request Nov 7, 2022
Prior to #14262, resource watchers would periodically close their watcher,
create a new one and refetch the current set of resources. It turns out
that the reverse tunnel subsystem relied on this behavior to periodically
broadcast the list of proxies to agents during steady state. Now that
watchers are persistent and no longer perform a refetch, agents that are
unable to connect to a proxy expire them after a period of time, and
since they never receive the periodic refresh, they never attempt to
connect to said proxy again.

To remedy this, a new ticker is added to the `localsite` that grabs
the current set of proxies from its proxy watcher and sends a discovery
request to the agent. The frequency of the ticker is set to fire
prior to the tracker would expire the proxy so that if a proxy exists
in the cluster, then the agent will continually try to connect to it.
rosstimothy added a commit that referenced this pull request Nov 16, 2022
Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.
rosstimothy added a commit that referenced this pull request Nov 18, 2022
* Improve site and trusted cluster logging and availability

Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.
github-actions bot pushed a commit that referenced this pull request Nov 18, 2022
Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.
rosstimothy added a commit that referenced this pull request Nov 18, 2022
Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.
rosstimothy added a commit that referenced this pull request Nov 18, 2022
* Improve site and trusted cluster logging and availability

Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.
zmb3 pushed a commit that referenced this pull request Nov 18, 2022
* Improve site and trusted cluster logging and availability

Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.

* update metrics docs

* Update docs/pages/includes/metrics.mdx

Co-authored-by: Alex Fornuto <alex.fornuto@goteleport.com>

Co-authored-by: Alex Fornuto <alex.fornuto@goteleport.com>
zmb3 pushed a commit that referenced this pull request Nov 18, 2022
* Improve site and trusted cluster logging and availability

Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.
zmb3 pushed a commit that referenced this pull request Nov 21, 2022
Moves `UpdateTrustedCluster` logging from debug to info so default
logging level includes when admin operations are performed to
establish or remove trust. Alters `remoteSite` such that it logs
in the same manner as `localSite`

Cherry-picks some of the availability changes made in #18050 to
ensure that agents spawned for trusted clusters are more robust to
connection issues.

* Ensure metric `remote_cluster` reflects current state

The metric wasn't properly updated when remote sites went offline
of when remote cluster resources were removed. Any change to the
remoteSite state or the remoteCluster resource are now accurately
reflected in the metric.

* Add tracking of outbound connections to remote clusters

The metric `trust_clusters` existed and was exported, but was never
used anywhere. Now when the `RemoteClusterTunnelManager` starts
and stops agent pools it will create and delete a counter for the
cluster. Within the `AgentPool` the metric is set to the number
of connected proxies within `updateConnectedProxies`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants