Unable to get interactive prompt when running `rails c` #1032

nikokon · 2021-08-12T19:44:31Z

Description

I’m trying to run a rails console but it seems to exit and delete the pod without prompting for input. Running against a running pod using -e is working as usual.

When debugging the issue further I notice that it was working in the "default" namespace but neither "development" nor "nightly" where the same app was living.

Location

Browser
CLI
API

Steps to reproduce

porter run --namespace <some_namespace_other_than_default> <rails_app_name> -- bin/rails c
await it to start pod, print the logs and exit

Additional Details

porter logs:

Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

<the usual logs printed when starting rails console>

Sucessfully deleted ephemeral pod

The text was updated successfully, but these errors were encountered:

nikokon · 2021-08-12T19:48:30Z

I was thinking it might be a memory constraint but I just noticed that the nightly namespace actually had more memory defined than the default.

abelanger5 · 2021-08-12T23:58:26Z

Hey @nikokon, thanks for the bug report! I've just added the --verbose option to porter run in version v0.7.2 of the CLI -- could you give it a try and see if that helps debug? This will print the events associated with the ephemeral pod, which should give us some insight.

There could be multiple issues here, it seems that if the logs are getting printed you're most likely running into a memory constraint.

In the case that the logs aren't getting printed, one potential limitation is when the cluster doesn't have enough resources to create the ephemeral pod with the assigned resources, the cluster may trigger a scale-up of nodes. However, this doesn't happen in 10-second window that the ephemeral pod is allocated, and so the pod gets deleted before the new node is ready.

nikokon · 2021-08-17T18:05:56Z

Sorry for the delay, been off the computer for a few days. So, I've been running the command a few times now running CLI version 0.7.4
succeeded a couple of times but mostly failed been running only in default namespace so that's probably not an issue, however it does feel like a resource / timing issue. Here's the output from a couple of runs (I just blanked out some urls for good messure 🤷)

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
pod triggered scale-up: [{***autoscaling-group*** 16->17 (max: 50)}]
Sucessfully deleted ephemeral pod

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
Sucessfully deleted ephemeral pod

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
Successfully assigned ***pod-name*** to ***eks-node***
Pulling image ***application-docker-image***
Sucessfully deleted ephemeral pod

On this try it worked, as well as a couple of seconds later when I retried again.

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
/* a bunch of application initializer logs */
Loading production environment (Rails 6.1.3.2)
Pod events:

Successfully assigned ***pod-name*** to ***eks-node***
Pulling image ***application-docker-image***
Successfully pulled image ***application-docker-image***
Created container web
Started container web
Sucessfully deleted ephemeral pod

abelanger5 · 2021-08-17T18:38:26Z

Interesting, this is really useful -- thanks! Here's my best guess at what's happening:

=> porter run --verbose plick-api -- bin/rails c: for the first run, the message pod triggered scale-up: [{***autoscaling-group*** 16->17 (max: 50)}] means that the ephemeral pod cannot fit on any of your current nodes. Since we only grant 10 seconds for the pod to start running, the pod gets deleted before the new node is spun up.
=> porter run --verbose plick-api -- bin/rails c: the message 1 node(s) had taints that the pod didn't tolerate is because the node that was spun up in the last step is not ready yet (it has a "taint" marking it as not ready)
=> porter run --verbose plick-api -- bin/rails c: in this case, the node becomes ready, but the entire process took too long (takes too long to pull the image).
=> porter run --verbose plick-api -- bin/rails c: the new node is ready, and the image is pulled in time! Everything then works as expected.

The pod that the config is being copied from must have a really large memory request if it doesn't fit on an existing node. There are a few options to resolve this:

Provide a flag to change the amount of memory being requested by the pod, so it could fit on existing nodes.
Provide a flag to increase the amount of time that we are waiting for the pod to be ready. It's by default 10 seconds, but in this case a timeout of 5 minutes or so might make more sense? However, it seems annoying to wait around for that long just to get an ephemeral shell.

What do you think? Any other options you can think of?

nikokon · 2021-08-26T09:32:09Z

@abelanger5 sorry for the late reply. I think both flags could be useful combined with some logs explaining the issue.

abelanger5 mentioned this issue Aug 19, 2021

fix: connect term sizing to stream io #1070

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to get interactive prompt when running `rails c` #1032

Unable to get interactive prompt when running `rails c` #1032

nikokon commented Aug 12, 2021

nikokon commented Aug 12, 2021

abelanger5 commented Aug 12, 2021

nikokon commented Aug 17, 2021

abelanger5 commented Aug 17, 2021

nikokon commented Aug 26, 2021

Unable to get interactive prompt when running rails c #1032

Unable to get interactive prompt when running rails c #1032

Comments

nikokon commented Aug 12, 2021

Description

Location

Steps to reproduce

Additional Details

nikokon commented Aug 12, 2021

abelanger5 commented Aug 12, 2021

nikokon commented Aug 17, 2021

abelanger5 commented Aug 17, 2021

nikokon commented Aug 26, 2021

Unable to get interactive prompt when running `rails c` #1032

Unable to get interactive prompt when running `rails c` #1032