Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fatal Process exited with status 1 #1333

Open
rugginic opened this issue Oct 24, 2024 · 8 comments
Open

fatal Process exited with status 1 #1333

rugginic opened this issue Oct 24, 2024 · 8 comments

Comments

@rugginic
Copy link

What happened?
Creating AWS Workspace very often (like 50% of the times) fails with the following FATAL error:
[13:26:36] fatal Process exited with status 1 run agent command github.com/loft-sh/devpod/pkg/devcontainer/sshtunnel.ExecuteCommand.func2 D:/a/devpod/devpod/pkg/devcontainer/sshtunnel/sshtunnel.go:129 runtime.goexit C:/Users/runneradmin/go/pkg/mod/golang.org/toolchain@v0.0.1-go1.22.5.windows-amd64/src/runtime/asm_amd64.s:1695

What did you expect to happen instead?
To be able to run the same workspace successfully every time.

How can we reproduce the bug? (as minimally and precisely as possible)
I have no idea.

My devcontainer.json:

{
    "name": "AFT",
    // Or use a Dockerfile or Docker Compose file. More info: https://containers.dev/guide/dockerfile
    "image": "mcr.microsoft.com/devcontainers/python:1-3.11-bullseye",
    "features": {
        "https://github.qualcomm.com/Cloud/nscert-feature/releases/download/0.0.2/devcontainer-feature-nscerts.tgz": {},
        "ghcr.io/devcontainers-contrib/features/poetry:2": {}
    },
    "overrideFeatureInstallOrder": [
        "https://github.qualcomm.com/Cloud/nscert-feature/releases/download/0.0.2/devcontainer-feature-nscerts.tgz"
    ],

    // Use 'postCreateCommand' to run commands after the container is created.
    "postCreateCommand": "./.devcontainer/postCreateCommand.sh",

    // Configure tool-specific properties.
    "customizations": {
        "vscode": {
            "extensions": [
                "ms-python.python",
                "editorconfig.editorconfig",
                "streetsidesoftware.code-spell-checker",
                "ms-python.vscode-pylance",
                "ms-python.black-formatter",
                "mutantdino.resourcemonitor"
            ],
            "settings": {
                "remote.SSH.connectTimeout": 3600,
                "python.testing.pytestArgs": ["tests"],
                "python.testing.unittestEnabled": false,
                "python.testing.pytestEnabled": true,
                "python.defaultInterpreterPath": "/workspaces/AFT/.venv/bin/python",
                "python.testing.pytestPath": "/workspaces/AFT/.venv/bin/pytest",
                "editor.defaultFormatter": "ms-python.black-formatter",
                "python.formatting.provider": "black",
                "editor.formatOnType": true,
                "editor.formatOnSave": true
            }
        }
    },
    "remoteUser": "vscode"
}

Local Environment:

  • DevPod Version: v0.5.21
  • Operating System: windows
  • ARCH of the OS: AMD64

DevPod Provider:

  • Cloud Provider: aws

Anything else we need to know?
This happens randomly, but almost 50% of the times.

@bkneis
Copy link
Contributor

bkneis commented Oct 25, 2024

@rugginic looking at the logs, I can see

{"type":"data","data":{"time":"2024-10-24T13:26:36.7809381+01:00","message":"#2 ERROR: failed to do request: Head \"https://registry-1.docker.io/v2/docker/dockerfile/manifests/1.4\": http: server gave HTTP response to HTTPS client","level":"info"}}

It seems that your environment may be experiencing network issues connecting to the docker registry. Can you SSH into the VM? Can you pull the image using docker pull mcr.microsoft.com/devcontainers/python:1-3.11-bullseye on the VM?

@rugginic
Copy link
Author

I'm able to pull the image from the EC2 instance. This problem happens randomly. Are you implying the FATAL error is related to the pull error? This correlation is not obvious for me. Can you elaborate?

@bkneis
Copy link
Contributor

bkneis commented Oct 30, 2024

@rugginic the error I posted was a build error, it was at the top of the stacktrace in your logs where the issue seems to start. I cannot resolved the DNS github.qualcomm.com, how is this resolved for you? How long does the build process take? I wonder if there is a timeout occuring. Does the pipeline fail if you remove the https://github.qualcomm.com/Cloud/nscert-feature/releases/download/0.0.2/devcontainer-feature-nscerts.tgz feature?

@rugginic
Copy link
Author

Hi, just to add more context to this, I don't get any error when I define the workspace. It deploys successfully.
The issue appears when I stop and start the workspace or if I try rebuild or reset.
The only way to recover is to delete and re-create the provider.

@rugginic
Copy link
Author

[08:53:45] info #2 resolve image config for docker-image://docker.io/docker/dockerfile:1.4
[08:53:45] info #2 ERROR: failed to do request: Head "https://registry-1.docker.io/v2/docker/dockerfile/manifests/1.4": http: server gave HTTP response to HTTPS client
[08:53:46] info devcontainer up: build image: buildx build: build image: exit status 1
[08:53:46] info ------
[08:53:46] info error parsing workspace info: rerun as root: exit status 1
[08:53:46] error Try enabling Debug mode under Settings to see a more verbose output
[08:53:46] fatal run agent command: Process exited with status 1

We don't have access to registry-1.docker.io. How can I configure a different registry so that I don't hit this error?

@bkneis
Copy link
Contributor

bkneis commented Oct 31, 2024

@rugginic registry-1 is a public docker registry and used in most images, here it is getting the dockerfile spec for version 1.4. You don't need a mirror for this. Did you try without the feature? Or check how long it takes to download? Hmmm that is odd the issue is only during rebuilds. So when you run devpod up, it's always happy? Then when you try devpod up --reset it fails 50% of the time?

@rugginic
Copy link
Author

@bkneis I can't remove the feature otherwise ssl communication will fail. And, as I said before, we don't have access to the docker public registry. There are company policies.

  1. Is there a way to configure where to pull that image that is failing, so that I can specify an internal (non public) registry?
  2. Why is it pulling it only on rebuild/restart of the same workspace?

I narrowed down the issue. When I create the provider it works 100% of the times. If I stop / reset/ rebuild, it fails 100% of the times with the same error.
So basically my workspaces only work once. Anytime I'm reconnecting to the same repo I have to delete and re-create the workspace. And just to give more context, this happens to everyone in my team.

@bkneis
Copy link
Contributor

bkneis commented Nov 1, 2024

I believe this is the docker image you are looking for https://hub.docker.com/r/docker/dockerfile/tags/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants