-
Notifications
You must be signed in to change notification settings - Fork 65
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Incident] OpenScapes unauthenticated users and CPU usage spike #908
Comments
update: new user image@betolink just set the hub's user image to |
I've deleted the ingress objects to prevent external access to the hubs. k -n prod delete ingress jupyterhub k -n staging delete ingress jupyterhub Ref 2i2c-org#908
I've made the hub inaccessible (#911). |
With 2449ab1, I've fixed the hole here letting anyone unauthenticated through. I also took a look at our config to see if we had other hubs missing this config, and there weren't any. |
I've made a backup of the jupyterhub.sqlite file on the openscapes hub, and deleted the existing database. Upon next deploy, this will remove all existing users on the hub including the cryptominers. |
I've updated the top comment with some follow-up issues. @yuvipanda do you agree with the main things to follow up on? Feel free to add or edit as you wish. |
Update: report sent to OpenScapesI've sent an email report to OpenScapes with the following text: Email textHey all - here is a brief after action report to describe what happened, and the current state of things. I'm also cc'ing the Code for Science and Society team so that they have visibility. Summary of what happened The JupyterHub configuration for the OpenScapes hub was missing an option that was not critical for the hub to function properly, but was critical for authorization to function properly. Because of this, unauthorized users were able to access the hub. This was 2i2c's responsibility and we missed this mis-configuration. An anonymous user found the link to the OpenScapes hub, was able to log-in without authorization, and around Dec 21st they started creating fake user accounts and spinning up crypto mining sessions on the hub. This resulted in the large spike in cloud costs. Erin noticed this spike on the morning of January 2nd (US/Pacific time) and alerted 2i2c support. By that evening we had patched this bug and deleted the non-admin user accounts. There were around 2 weeks of heavy use related to this user's crypto-mining scripts. Current situation
Next steps
You can find a full incident report and ongoing conversation here. Erin and others - I want to extend my apologies for this problem, and the stress that it has caused. We'll take necessary precautions to avoid this in the future, and will follow up if we have clarifications we need then. We'll also do what is necessary to make sure that OpenScapes isn't the one to bear the extra cloud costs. |
I've put together a short report for the OpenScapes team to use in their appeal to reduce their cloud bill. Here's a link: https://docs.google.com/document/d/106VbSeHDOGbsu-oLENmVJIWkK3MZu-EQN22Qgia4ybo/edit# |
Thanks so much for this doc, @choldgraf. I have added additional details for what I was doing on the AWS side to manage and monitor. I submitted and will stay in touch about resolution. |
Thanks for checking in @choldgraf - a note today said sorry they are slow, still reviewing |
We've just heard back from @erinmr that AWS has forgiven their cloud bill for this incident. I think that we can close this one. Phew! Thanks so much everybody for being a combination of helpful, patient, and generally awesome :-) |
Summary
Over the past 2 weeks there has been a spike in usage of the OpenScapes hub. User pods have been increasingly steadily over time. Because nodes scale with users on a 1-1 ratio, this has resulted in a large spike in nodes as well.
Hub information
Timeline (if relevant)
All times in US/Pacific
2021-01-02 11:00AM
OpenScapes emails 2i2c support saying that they've gotten an extremely large AWS bill for December.
They reported several users who they did not add to the hub, and who had GitHub accounts created in the last 2 weeks.
The hub was shut down as a precautionary measure.
12:01pm
A look at the grafana logs showed that around the 21st the hub users started going up steadily:
It seems like many of these users were maxing out the CPU:
Though the plot for users with high CPU usage is broken so this is hard to confirm:
EDIT: the users have mining software in their filesystems:
12:40pm
Found the reason that unauthorized users can access:
Any github user that logs into this hub is authorized. A new GitHub account was created from scratch, and attempted to log-in, and was able to do so successfully despite not being added to the hub's user list.
Noticed that the openscapes hub does not explicitly have an allowed_usersconfig set:
here's the 2i2c gke hub
infrastructure/config/hubs/2i2c.cluster.yaml
Lines 90 to 96 in b6ab94a
here's the openscapes hub
infrastructure/config/hubs/openscapes.cluster.yaml
Lines 149 to 154 in b6ab94a
21:00
The hub is made inaccessible via this PR: #911
21:25
Corrected the configuration to properly authorize users: 2449ab1
Also deleted all of the users on the hub, so they will need to be manually added back in.
After-action report
There are two major takeaways from this incident:
What went wrong
Where we got lucky
The OpenScapes team checked their bill relatively quickly, otherwise we would have incurred significant extra cloud cost. They were on track to spend $20,000 in January alone.
Action items
Process improvements
infrastructure/
PRs? #913Documentation improvements
Technical improvements
Actions
The text was updated successfully, but these errors were encountered: