-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature: Install GCP Ops Agent Automatically #1296
Comments
👋 @evamaxfield thanks for your feedback! Runners allows also to include a setup script via base64 via curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install The cml command could be:
Disclaimer: I have not tried it yet but its the counterpart of the already known AWS scenario. |
Hey! I actually gave that a try a couple of days ago but it didn't work. The setup script, The only way for me to try it and get the logs is unfortunately not to do it in the startup script but to open a cloud ssh connection and run it there. Is that okay? |
You might find this helpful to view the instance's startup. |
Update: I am trying to look at the logs but not entirely sure what I should be looking for. Not only that, but I copied much of one of the cml-playground yaml examples and am using it but instead of sleeping for 30s I am sleeping for 10 minutes just to see the logs / explore the machine in a different SSH session. The machine crashes regardless. Taken together with all of my other failed runs, it looks like the machine can't last longer than ~10 minutes? Example with just sleeping / cycling for 10 minutes and crashing: https://github.com/evamaxfield/gcloud-whisper-testing/actions/runs/3868218689 actually correction -- across all the runs I am noticing that none of the runs succeed if the actual usage of the created runner lasts longer than 5 minutes. The first job to create the GCP runner works just fine, the second job of using that runner has never lasted more than 5 minutes and typically crashes at 4:59 duration (or 5:00 min +- 10 seconds). Lots of examples: https://github.com/evamaxfield/gcloud-whisper-testing/actions |
This seems more related to #1291 rather than this feature. If you want me to reopen that issue / move discussion there, let me know. |
On my discovery of that, it seems like most of my issues stem from: #1255 I increased the idle-timeout and my action is working. Will run a few more tests to make sure |
When creating a GCP runner with CML it would be great to have memory utilization and disk utilization and generally better logging available. GCP Ops Agent seems to be the way to do that.
It would be great to install Ops Agent on GCP runners automatically during the startup script.
The text was updated successfully, but these errors were encountered: