-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improvements to calendar based auto scaler + Short term options to handle the support requests #4289
Comments
What I wrote in slack was imagining a short term fix. Long term I think it'd be better to use some k8s primitives, like a CronJob or some such. |
If one downloads the calendar (link via the calendar settings page), it has the events, but according to the logs the scaler is only seeing one event, the evening cool off. |
As requested, for this incident it did not "see" an event that was happening now and on another occasion it thought the previous day's evening cool off event was happening now even though it wasn't. |
Alternative library: https://pypi.org/project/ical/ |
The calendar based auto-scaler did not work today (atleast from what I observed in grafana) From 4.30 PM today, I manually updated the number of nodes in R nodepool so that the there isn't any pending pods. I went to the Google Cloud console UI and updated the number of nodes to 4 (based on @ryanlovett's input). Even when I had allocated 4 nodes (this class always needed 3 nodes previously based on grafana data), there still were 50+ pending pods for a certain duration of time. I will share the metrics observed during the last 30 minutes @ryanlovett Is it possible to update the placeholders directly instead of the actual nodes? Whenever I made edits to the actual number of nodes, the autoscaler brought it back to the required nodes based on the current demand in 2 minutes. It will make it easy for me to support this requests manually by updating placeholders over nodes in the upcoming days. |
@balajialg Sounds like the script was using bad values again. There is a way to update the placeholders, but it is the hub config, https://github.com/berkeley-dsep-infra/datahub/blob/staging/node-placeholder/values.yaml#L186. Rather than commit and run through CI, it might be best to make changes and manually Another option would be to disable the scaler entirely and then just use the cloud console. |
@ryanlovett Yes, I waited till 4.45 PM to see if the calendar based auto scaler had any impact on the node count. Unfortunately, I couldn't see any change. I haven't set up hubploy on my local device as I ran into issues with sops. I need help fixing that but don't want to waste dev cycles doing that when it is already scarce. I would like to explore your other option of disabling scaler entirely and use cloud console. Can you expand on how would that look like? |
@balajialg Some ways to temporarily disable the scaler:
The first one is probably way easier. |
@ryanlovett This is great. Lets explore option 1 during our 4 PM meeting today. Thanks |
|
calendar scaler is fixed and deployed... i believe that we can close this issue! |
Closing this issue. Fantastic work @shaneknapp @ryanlovett ! |
Summary
Calendar based scaler which was scheduled for Pol Sci 3 class between 4.30 - 6.30 PM today did not work as expected causing a lot of grief for the instructor and the students wrt server startup times (Ref #4009). @rylo reported that the logs did not have any information about the scheduled event which highlights that the calendar scaler did not work as expected.
Seems like there was a rush around 5 - 5.30 PM when 190+ students accessed R hub.

We provisioned 4 additional placeholders for this class using calendar scaler. Node count went from 1 to 3 during the scheduled time (which means we did over allocate for this class)
We should explore alternatives for servicing these time-bound resource increase requests in the short term till we figure out a way to make auto scaler's behavior consistent. @rylo had the following suggestion,
I'd favor storing scale up events in a simple yaml file in the datahub repo. I think we have to think about what data we should put in the file and how to represent it then alter the current scaler to parse the data and just emit logs about what it would try to do if it was going to do it. If it seemed like it was working properly, switch the scaler from using the google calendar as the primary source of record to the yaml file.
Proposed Solutions
Long Term
Medium Term
Short Term
Task to be performed
ical
library.The text was updated successfully, but these errors were encountered: