Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to get interactive prompt when running rails c #1032

Open
1 of 3 tasks
nikokon opened this issue Aug 12, 2021 · 5 comments
Open
1 of 3 tasks

Unable to get interactive prompt when running rails c #1032

nikokon opened this issue Aug 12, 2021 · 5 comments

Comments

@nikokon
Copy link

nikokon commented Aug 12, 2021

Description

I’m trying to run a rails console but it seems to exit and delete the pod without prompting for input. Running against a running pod using -e is working as usual.

When debugging the issue further I notice that it was working in the "default" namespace but neither "development" nor "nightly" where the same app was living.

Location

  • Browser
  • CLI
  • API

Steps to reproduce

  1. porter run --namespace <some_namespace_other_than_default> <rails_app_name> -- bin/rails c
  2. await it to start pod, print the logs and exit

Additional Details

porter logs:

Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

<the usual logs printed when starting rails console>

Sucessfully deleted ephemeral pod

@nikokon
Copy link
Author

nikokon commented Aug 12, 2021

I was thinking it might be a memory constraint but I just noticed that the nightly namespace actually had more memory defined than the default.

@abelanger5
Copy link
Contributor

Hey @nikokon, thanks for the bug report! I've just added the --verbose option to porter run in version v0.7.2 of the CLI -- could you give it a try and see if that helps debug? This will print the events associated with the ephemeral pod, which should give us some insight.

There could be multiple issues here, it seems that if the logs are getting printed you're most likely running into a memory constraint.

In the case that the logs aren't getting printed, one potential limitation is when the cluster doesn't have enough resources to create the ephemeral pod with the assigned resources, the cluster may trigger a scale-up of nodes. However, this doesn't happen in 10-second window that the ephemeral pod is allocated, and so the pod gets deleted before the new node is ready.

@nikokon
Copy link
Author

nikokon commented Aug 17, 2021

Sorry for the delay, been off the computer for a few days. So, I've been running the command a few times now running CLI version 0.7.4
succeeded a couple of times but mostly failed been running only in default namespace so that's probably not an issue, however it does feel like a resource / timing issue. Here's the output from a couple of runs (I just blanked out some urls for good messure 🤷)

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
pod triggered scale-up: [{***autoscaling-group*** 16->17 (max: 50)}]
Sucessfully deleted ephemeral pod
=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/21 nodes are available: 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
Sucessfully deleted ephemeral pod
=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
Could not open a shell to this container. Container logs:

Could not get logs. Pod events:

0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
0/22 nodes are available: 1 node(s) had taints that the pod didn't tolerate, 13 Insufficient cpu, 8 Insufficient memory.
Successfully assigned ***pod-name*** to ***eks-node***
Pulling image ***application-docker-image***
Sucessfully deleted ephemeral pod

On this try it worked, as well as a couple of seconds later when I retried again.

=> porter run --verbose plick-api -- bin/rails c
Running bin/rails c for release plick-api
Attempting connection to the container, this may take up to 10 seconds. If you don't see a command prompt, try pressing enter.
/* a bunch of application initializer logs */
Loading production environment (Rails 6.1.3.2)
Pod events:

Successfully assigned ***pod-name*** to ***eks-node***
Pulling image ***application-docker-image***
Successfully pulled image ***application-docker-image***
Created container web
Started container web
Sucessfully deleted ephemeral pod

@abelanger5
Copy link
Contributor

Interesting, this is really useful -- thanks! Here's my best guess at what's happening:

  1. => porter run --verbose plick-api -- bin/rails c: for the first run, the message pod triggered scale-up: [{***autoscaling-group*** 16->17 (max: 50)}] means that the ephemeral pod cannot fit on any of your current nodes. Since we only grant 10 seconds for the pod to start running, the pod gets deleted before the new node is spun up.
  2. => porter run --verbose plick-api -- bin/rails c: the message 1 node(s) had taints that the pod didn't tolerate is because the node that was spun up in the last step is not ready yet (it has a "taint" marking it as not ready)
  3. => porter run --verbose plick-api -- bin/rails c: in this case, the node becomes ready, but the entire process took too long (takes too long to pull the image).
  4. => porter run --verbose plick-api -- bin/rails c: the new node is ready, and the image is pulled in time! Everything then works as expected.

The pod that the config is being copied from must have a really large memory request if it doesn't fit on an existing node. There are a few options to resolve this:

  • Provide a flag to change the amount of memory being requested by the pod, so it could fit on existing nodes.
  • Provide a flag to increase the amount of time that we are waiting for the pod to be ready. It's by default 10 seconds, but in this case a timeout of 5 minutes or so might make more sense? However, it seems annoying to wait around for that long just to get an ephemeral shell.

What do you think? Any other options you can think of?

@nikokon
Copy link
Author

nikokon commented Aug 26, 2021

@abelanger5 sorry for the late reply. I think both flags could be useful combined with some logs explaining the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants