Skip to content

Conversation

@agrawalradhika-cell
Copy link
Contributor

@agrawalradhika-cell agrawalradhika-cell commented Oct 16, 2025

feat: Add retry logic when certificate mismatch for existing credentials & Agent Identity workloads

This change introduces retry support when requests are created for existing credentials and Agent Identities on GKE and Cloud Run Workloads. When 401(Unauthorized) error is created, due to certificate at time of configuration of mTLS channel being different from the current certificate, a retry is added to the request by configuring the mTLS channel with the current certificate.

This change introduces retry support when requests are created for AgentIdentities on GKE and Cloud Run Workloads.

Signed-off-by: Radhika Agrawal <agrawalradhika@google.com>
…ion and request

Signed-off-by: Radhika Agrawal <agrawalradhika@google.com>
@agrawalradhika-cell agrawalradhika-cell changed the title Cert rotation x509 feat: Add retry logic when certificate mismatch for Agent Identity Oct 16, 2025
@agrawalradhika-cell agrawalradhika-cell marked this pull request as ready for review October 17, 2025 01:17
@agrawalradhika-cell agrawalradhika-cell requested review from a team as code owners October 17, 2025 01:17
@agrawalradhika-cell agrawalradhika-cell marked this pull request as draft October 17, 2025 01:17
… from mTLS configuration

Signed-off-by: Radhika Agrawal <agrawalradhika@google.com>
@agrawalradhika-cell agrawalradhika-cell marked this pull request as ready for review October 17, 2025 19:54
@lsirac
Copy link
Contributor

lsirac commented Oct 18, 2025

Is the description accurate? This will apply to existing X509 workloads too?

@agrawalradhika-cell agrawalradhika-cell changed the title feat: Add retry logic when certificate mismatch for Agent Identity feat: Add retry logic when certificate mismatch for existing credentials & Agent Identity workloads Oct 20, 2025
@agrawalradhika-cell
Copy link
Contributor Author

Is the description accurate? This will apply to existing X509 workloads too?

Updated the description

@@ -0,0 +1,188 @@
# Copyright 2025 Google LLC
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving a comment here so we won't forget to wait for the PR #1821 to merge first and do a rebase to ensure we don't have duplicated or conflicting code.

@nbayati nbayati added the do not merge Indicates a pull request not ready for merge, due to either quality or timing. label Oct 28, 2025
sai-sunder-s
sai-sunder-s previously approved these changes Oct 30, 2025
Copy link
Collaborator

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let a few small comments, but nothing major. I avoided _agent_identity_utils entirely because of conflicts with #1821

I'll take another look when #1821 is merged

# New certificate and key to simulate rotation.
new_cert = b'-----BEGIN CERTIFICATE-----\nMIIDIzCCAgugAwIBAgIJAMfISuBQ5m+5MA0GCSqGSIb3DQEBBQUAMBUxEzARBgNV\nBAMTCnVuaXQtdGVzdHMwHhcNMTExMjA2MTYyNjAyWhcNMjExMjAzMTYyNjAyWjAV\nMRMwEQYDVQQDEwp1bml0LXRlc3RzMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIB\nCgKCAQEA4ej0p7bQ7L/r4rVGUz9RN4VQWoej1Bg1mYWIDYslvKrk1gpj7wZgkdmM\n7oVK2OfgrSj/FCTkInKPqaCR0gD7K80q+mLBrN3PUkDrJQZpvRZIff3/xmVU1Wer\nuQLFJjnFb2dqu0s/FY/2kWiJtBCakXvXEOb7zfbINuayL+MSsCGSdVYsSliS5qQp\ngyDap+8b5fpXZVJkq92hrcNtbkg7hCYUJczt8n9hcCTJCfUpApvaFQ18pe+zpyl4\n+WzkP66I28hniMQyUlA1hBiskT7qiouq0m8IOodhv2fagSZKjOTTU2xkSBc//fy3\nZpsL7WqgsZS7Q+0VRK8gKfqkxg5OYQIDAQABo3YwdDAdBgNVHQ4EFgQU2RQ8yO+O\ngN8oVW2SW7RLrfYd9jEwRQYDVR0jBD4wPIAU2RQ8yO+OgN8oVW2SW7RLrfYd9jGh\nGaQXMBUxEzARBgNVBAMTCnVuaXQtdGVzdHOCCQDHyErgUOZvuTAMBgNVHRMEBTAD\nAQH/MA0GCSqGSIb3DQEBBQUAA4IBAQBRv+M/6+FiVu7KXNjFI5pSN17OcW5QUtPr\nodJMlWrJBtynn/TA1oJlYu3yV5clc/71Vr/AxuX5xGP+IXL32YDF9lTUJXG/uUGk\n+JETpKmQviPbRsvzYhz4pf6ZIOZMc3/GIcNq92ECbseGO+yAgyWUVKMmZM0HqXC9\novNslqe0M8C1sLm1zAR5z/h/litE7/8O2ietija3Q/qtl2TOXJdCA6sgjJX2WUql\nybrC55ct18NKf3qhpcEkGQvFU40rVYApJpi98DiZPYFdx1oBDp/f4uZ3ojpxRVFT\ncDwcJLfNRCPUhormsY7fDS9xSyThiHsW9mjJYdcaKQkwYZ0F11yB\n-----END CERTIFICATE-----\n'
new_key = b'-----BEGIN ENCRYPTED PRIVATE KEY-----\nMIHeMEkGCSqGSIb3DQEFDTA8MBsGCSqGSIb3DQEFDDAOBAj9XnJ2h78QVAICCAAw\nHQYJYIZIAWUDBAECBBBeiiOF2LnLzq/wjb/viwMwBIGQk28Zkfj2EIk42bgc7UzC\nSf98qssCVhsIYz0Xa3eSATg8Cpn83YieaBeyxdk/tXTnrOhxMV/vt7T98kWhaGbH\n5Z9CdGVLfes0UFvVJqrlk6vcf2sOnLCGbrn78HS+ayrGOCRSCd/7+dnEiB/7Um1B\nMk6BBJHsLEnZZSHyfrw8jvYgVmcSBy/WdY0pqldD/+4D\n-----END ENCRYPTED PRIVATE KEY-----\n'

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like these certificates are copy/pasted all over the tests. Can we make them shared constants or something? Or maybe a test fixture?

Signed-off-by: Radhika Agrawal <agrawalradhika@google.com>
and _credential_refresh_attempt < self._max_refresh_attempts
):

if response.status_code == 401:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it might be good to have a comment above this explaining what 401 represents

_LOGGER.info(
"Skipping reconfiguration of mTLS channel because the client"
" certificate has not changed."
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest pulling out all this new code into a helper. The request method is getting pretty long and hard to follow

)
cert_obj = _agent_identity_utils.parse_certificate(
call_cert_callback_result[0]
)
Copy link
Collaborator

@daniel-sanche daniel-sanche Nov 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the parser can raise ValueErrors for invalid data. Are we ok with raising an exception here?

If so, we should declare it in the docstring

if self.is_mtls:
call_cert_callback_result = (
_agent_identity_utils.call_client_cert_callback()
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seeing an untyped call_cert_callback_result and then indexing into it later makes me a bit nervous

I'd suggest either unpacking this as two values:

call_cert_bytes, call_key_bytes  = (
      _agent_identity_utils.call_client_cert_callback()
  )

or giving it a type annotation:

call_cert_callback_result: tuple[bytes, bytes] = (
    _agent_identity_utils.call_client_cert_callback()
)

@property
def cached_cert(self):
"""Returns the cached client certificate."""
return self._cached_cert
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is something that needs to be publicly accessible?

If so, can you add a type here? Either using a type annotation, or just describing it in the doctoring?


use_mtls = False
if self._is_mtls == True:
if "mtls.googleapis.com" in url or "mtls.sandbox.googleapis.com" in url:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: consider extracting these out

MTLS_URL_PREFIXES = ["mtls.googleapis.com", "mtls.sandbox.googleapis.com"]

...

use_mtls = any([prefix in url for prefix in MTLS_URL_PREFIXES])

):

_LOGGER.info(
if response.status == 401:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code looks almost identical to the code in requests.py. I already suggested pulling it into a helper, can you use the same helper here?

)
except Exception as e:
_LOGGER.error("Failed to reconfigure mTLS channel: %s", e)
raise e
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what exceptions do we expect here? The docstring only mentions google.auth.exceptions.MutualTLSChannelError (This may be a good place for a raise from, and raise MutualTLSChannelError here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do not merge Indicates a pull request not ready for merge, due to either quality or timing.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants