Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An error is raised when using pull-through caching in a disconnected environment #1499

Closed
lubosmj opened this issue Feb 3, 2024 · 8 comments · Fixed by #1642
Closed

An error is raised when using pull-through caching in a disconnected environment #1499

lubosmj opened this issue Feb 3, 2024 · 8 comments · Fixed by #1642
Assignees
Labels

Comments

@lubosmj
Copy link
Member

lubosmj commented Feb 3, 2024

This happens when using the pull-through caching workflow with no internet connection.

Traceback:

pulp [None]: backoff:ERROR: Giving up download_wrapper(...) after 4 tries (aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host registry-1.docker.io:443 ssl:default [Name or service not known])
[2024-02-03 17:20:47 +0000] [6911] [ERROR] Error handling request
Traceback (most recent call last):
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/connector.py", line 1173, in _create_direct_connection
    hosts = await asyncio.shield(host_resolved)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/connector.py", line 884, in _resolve_host
    addrs = await self._resolver.resolve(host, port, family=self._family)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/resolver.py", line 33, in resolve
    infos = await self._loop.getaddrinfo(
  File "/usr/lib64/python3.8/asyncio/base_events.py", line 825, in getaddrinfo
    return await self.run_in_executor(
  File "/usr/lib64/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/usr/lib64/python3.8/socket.py", line 918, in getaddrinfo
    for res in _socket.getaddrinfo(host, port, family, type, proto, flags):
socket.gaierror: [Errno -2] Name or service not known

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/web_protocol.py", line 452, in _handle_request
    resp = await request_handler(request)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/web_app.py", line 543, in _handle
    resp = await handler(request)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/web_middlewares.py", line 114, in impl
    return await handler(request)
  File "/src/pulpcore/pulpcore/content/authentication.py", line 48, in authenticate
    return await handler(request)
  File "/src/pulpcore/pulpcore/content/instrumentation.py", line 230, in middleware
    resp = await handler(request)
  File "/src/pulp_container/pulp_container/app/registry.py", line 158, in get_tag
    response = await downloader.run(
  File "/src/pulpcore/pulpcore/download/http.py", line 269, in run
    return await download_wrapper()
  File "/usr/local/lib/python3.8/site-packages/backoff/_async.py", line 151, in retry
    ret = await target(*args, **kwargs)
  File "/src/pulpcore/pulpcore/download/http.py", line 254, in download_wrapper
    return await self._run(extra_data=extra_data)
  File "/src/pulp_container/pulp_container/app/downloaders.py", line 79, in _run
    async with session_http_method(
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/client.py", line 1187, in __aenter__
    self._resp = await self._coro
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/client.py", line 574, in _request
    conn = await self._connector.connect(
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/connector.py", line 911, in _create_connection
    _, proto = await self._create_direct_connection(req, traces, timeout)
  File "/usr/local/lib64/python3.8/site-packages/aiohttp/connector.py", line 1187, in _create_direct_connection
    raise ClientConnectorError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host registry-1.docker.io:443 ssl:default [Name or service not known]
@lubosmj
Copy link
Member Author

lubosmj commented Feb 3, 2024

The following change fixes the error on the pulp-container's side.

--- a/pulp_container/app/registry.py
+++ b/pulp_container/app/registry.py
@@ -8,7 +8,7 @@ from contextlib import suppress
 from urllib.parse import urljoin
 
 from aiohttp import web
-from aiohttp.client_exceptions import ClientResponseError
+from aiohttp.client_exceptions import ClientResponseError, ClientConnectionError
 from aiohttp.web_exceptions import HTTPTooManyRequests
 from django_guid import set_guid
 from django_guid.utils import generate_guid
@@ -21,6 +21,7 @@ from pulpcore.plugin.content import Handler, PathNotResolved
 from pulpcore.plugin.models import RemoteArtifact, Content, ContentArtifact
 from pulpcore.plugin.content import ArtifactResponse
 from pulpcore.plugin.tasking import dispatch
+from pulpcore.plugin.exceptions import TimeoutException
 
 from pulp_container.app.cache import RegistryContentCache
 from pulp_container.app.models import ContainerDistribution, Tag, Blob, Manifest, BlobManifest
@@ -154,11 +155,12 @@ class Registry(Handler):
             )
             tag_url = urljoin(remote.url, relative_url)
             downloader = remote.get_downloader(url=tag_url)
+            downloader.max_retries = 0
             try:
                 response = await downloader.run(
                     extra_data={"headers": V2_ACCEPT_HEADERS, "http_method": "head"}
                 )
-            except ClientResponseError:
+            except (ClientResponseError, ClientConnectionError, TimeoutException):
                 # the manifest is not available on the remote anymore
                 # but the old one is still stored in the database
                 pass
diff --git a/pulp_container/app/registry_api.py b/pulp_container/app/registry_api.py
index fe0e5662..68018114 100644
--- a/pulp_container/app/registry_api.py
+++ b/pulp_container/app/registry_api.py
@@ -11,7 +11,7 @@ import logging
 import hashlib
 import re
 
-from aiohttp.client_exceptions import ClientResponseError
+from aiohttp.client_exceptions import ClientResponseError, ClientConnectionError
 from itertools import chain
 from urllib.parse import urljoin, urlparse, urlunparse, parse_qs, urlencode
 from tempfile import NamedTemporaryFile
@@ -28,6 +28,7 @@ from pulpcore.plugin.models import Artifact, ContentArtifact, UploadChunk
 from pulpcore.plugin.files import PulpTemporaryUploadedFile
 from pulpcore.plugin.tasking import add_and_remove, dispatch
 from pulpcore.plugin.util import get_objects_for_user, get_url
+from pulpcore.plugin.exceptions import TimeoutException
 from rest_framework.exceptions import (
     AuthenticationFailed,
     NotAuthenticated,
@@ -99,8 +100,11 @@ IGNORED_PULL_THROUGH_REMOTE_ATTRIBUTES = [
     "pulp_id",
     "url",
     "name",
+    "connect_timeout",
 ]
 
+REMOTE_CONNECTION_TIMEOUT = 2
+
 
 class ContentRenderer(BaseRenderer):
     """
@@ -316,6 +320,7 @@ class ContainerRegistryApiMixin:
                     name=path,
                     upstream_name=upstream_name.strip("/"),
                     url=pull_through_cache_distribution.remote.url,
+                    connect_timeout=REMOTE_CONNECTION_TIMEOUT,
                     **remote_data,
                 )
 
@@ -1098,6 +1103,7 @@ class Manifests(RedirectsMixin, ContainerRegistryApiMixin, ViewSet):
         )
         tag_url = urljoin(remote.url, relative_url)
         downloader = remote.get_downloader(url=tag_url)
+        downloader.max_retries = 0
         try:
             response = downloader.fetch(
                 extra_data={"headers": V2_ACCEPT_HEADERS, "http_method": "head"}
@@ -1110,6 +1116,9 @@ class Manifests(RedirectsMixin, ContainerRegistryApiMixin, ViewSet):
             else:
                 # TODO: do not mask out relevant errors, like HTTP 502
                 raise ManifestNotFound(reference=pk)
+        except (ClientConnectionError, TimeoutException):
+            # The remote server is not available
+            raise ManifestNotFound(reference=pk)
         else:
             digest = response.headers.get("docker-content-digest")
             return models.Manifest.objects.filter(digest=digest).first()

@lubosmj lubosmj moved this from Not Started to Todo in Pulp Container Roadmap Feb 3, 2024
@lubosmj lubosmj changed the title An internal server error is raised when using pull-through caching in disconnected environment An error is raised when using pull-through caching in a disconnected environment Feb 3, 2024
@lubosmj
Copy link
Member Author

lubosmj commented Feb 3, 2024

It might be worth considering adding a migration that set timeouts and backoffs to correct value for already existing remotes created during the pull-through caching.

http PATCH :5001/pulp/api/v3/remotes/container/container/018d6ffb-51e9-76ce-a000-48a8859f1f40/ connect_timeout=1

@lubosmj
Copy link
Member Author

lubosmj commented Feb 3, 2024

I am waiting to get this merged: pulp/pulpcore#5025.

@ipanova
Copy link
Member

ipanova commented Feb 5, 2024

I would expect similar traceback would be shown also during regular sync. None of these features are designed to work in air gapped env.

@lubosmj
Copy link
Member Author

lubosmj commented Feb 6, 2024

It does make sense for this feature to be able to deal with connectivity issues. Once you pull-through the content, it should be saved in the Pulp instance and be able to deliver the image whenever requested.

@ipanova
Copy link
Member

ipanova commented Feb 7, 2024

It makes sense to deliver content which is already locally cached, however for the cases when pulp instance needs to reach out to the internet whether during the first pull or during sync we should not raise 404 but propagate the error.

@lubosmj
Copy link
Member Author

lubosmj commented Feb 14, 2024

Yes, with the change, we should also refactor the following code paths:

# TODO: do not mask out relevant errors, like HTTP 502

# TODO: do not mask out relevant errors, like HTTP 502

@lubosmj
Copy link
Member Author

lubosmj commented Feb 14, 2024

TimeoutException can be consumed by plugin writers as of pulpcore 3.46.0 (2024-02-13).

@lubosmj lubosmj self-assigned this May 14, 2024
@pulpbot pulpbot moved this to In Progress in RH Pulp Kanban board May 14, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 21, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 21, 2024
@pulpbot pulpbot moved this from In Progress to Needs review in RH Pulp Kanban board May 21, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 21, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 21, 2024
@lubosmj lubosmj moved this from Todo to In Progress in Pulp Container Roadmap May 23, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 28, 2024
lubosmj added a commit to lubosmj/pulp_container that referenced this issue May 31, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in Pulp Container Roadmap Jun 3, 2024
@pulpbot pulpbot moved this from Needs review to Done in RH Pulp Kanban board Jun 3, 2024
@lubosmj lubosmj moved this from Done to Shipped in Pulp Container Roadmap Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: Shipped
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants