Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recover our un-billed cloud costs from FY 2022 #555

Closed
Tracked by #355
choldgraf opened this issue Nov 7, 2022 · 7 comments
Closed
Tracked by #355

Recover our un-billed cloud costs from FY 2022 #555

choldgraf opened this issue Nov 7, 2022 · 7 comments
Assignees
Labels
Partnerships Creating and fostering new collaborations with external groups Task Actions that don't involve changing our code or docs.

Comments

@choldgraf
Copy link
Member

Context

We have been paying the cloud costs for several communities over the past year, with the understanding that we'd pass these costs through to them. However, we have not invoiced them for any of these costs yet.

The costs are becoming a significant burden on 2i2c - at this point we are paying the equivalent of .5 FTE engineer every month. Moreover, as more time passes it is going to be increasingly difficult to ask communities to pay for increasingly large cloud costs. We should square up these balances as quickly as we can.

Proposal

We should do the following:

  • For each community that we work with
  • Calculate the cloud costs they have incurred in FY 2022.
  • For the shared cluster, follow the process that we define in Calculate cloud costs for shared hubs on a cluster on Google Cloud #499 and do this one time for all of these communities.
  • Send each community a one-time invoice for this amount, along with a GCP/AWS print-out of their charges.

Below is a list of the cloud costs we've incurred so far, broken down by project.

Costs we have incurred so far

Google Cloud: 2i2c billing

Through October 2022:

image

Project ID Project number Cost
two-eye-two-see 350668521154 $8,906.98
awi-ciroh 316329527429 $2,576.05
m2lines-hub 255943735824 $1,968.52
two-eye-two-see-uk 258122304099 $1,138.93

Google Cloud: two-eye-two-see

image

Project Project ID Project number Cost
m2lines-hub m2lines-hub 255943735824 $4,159.94
two-eye-two-see two-eye-two-see 350668521154 $3,505.00

AWS: shared billing account

image

Project ID Cost
uw-hackweeks $5,543.39
uw-gridSST $856.24
nasa-cryo-hubs $227.22
2i2c-sandbox $0.08

Updates and actions

No response

@choldgraf choldgraf added Task Actions that don't involve changing our code or docs. Partnerships Creating and fostering new collaborations with external groups labels Nov 7, 2022
@damianavila
Copy link
Contributor

Moreover, as more time passes it is going to be increasingly difficult to ask communities to pay for increasingly large cloud costs.

I could not agree more about this one. We should aim to pass through the cost before EOY. The plan outlined feels OK to me.

@damianavila
Copy link
Contributor

damianavila commented Nov 15, 2022

Notes from the meeting we had with @jmunroe and @yuvipanda.

We need to figure out the backlog and clear it!
No Columbia, no Toronto. The need to figure out the ones with billing accounts.
We have visibility and access to them.
We need to calculate up to the end of Nov, by a manual process in @jmunroe's hands.

RAM will be used as a proxy (via Grafana).
A spreadsheet will be produced by @jmunroe.
AWS project should be also covered

STEPS:

  1. billing on google projects
  2. billing in AWS projects
  3. billing in shared GCP
    3.1. estimated based on RAM

Starting Dec we need to have an automated system in place, see: 2i2c-org/infrastructure#1853.

Assigned this one to @jmunroe because he is pushing forward this one as per the above description.

@yuvipanda
Copy link
Member

I've provided @jmunroe access to the GCP and AWS accounts. LMK if those don't work, @jmunroe!

@jmunroe
Copy link
Contributor

jmunroe commented Nov 22, 2022

I've been making progress on clearing out the cloud costs backlog. My "workings" are available in the Google Shared folder: https://drive.google.com/drive/folders/11KqzudsaqRjtHo0k4AYId5gZgyrfT0uq?usp=share_link Because this GitHub is public facing, I'll refer the 2i2c team to that private folder for specifics about individual communities.

All data is from 2i2c inception to the end of October 2022. Once November 2022 has closed out, I will extend this work to include that month as well. This process is highly manual and our intention is to have an "automated" way of creating these reports and cloud costs recovery from December 1, 2022 onward.

I am handling GCP and AWS separately. Each folder contains a Google Doc with notes and a Google Sheet with workings. Any invoices from the cloud providers have also been copied to the respective cloud folder.

GCP
We have been using GCP since July 2020. It is not clear to me that any of these cloud costs have been yet reimbursed. The total costs incurred to date are $28,167.82.

Of that amount, it is clear to me how we can allocate $14,222.44 among community partners: m2lines, awi-ciroh, ohw, neurohackademy, lis, temple. This is either because we they are the only community partner on a cluster, or having only started since we started keep more detailed records on usage. Invoices for AWI and OHW have already been requested from CS&S.

The remaining $13,945.38 is more difficult allocate. Over the last 28 months, some of these costs will be for our own internal "development" and some we will need to written off as for community hubs we deployed without ever getting an agreement in place.

The allocation of resources on the shared 2i2c cluster is calculated as proportional to the "memory requested" through kubernetes. The memory requested determines the size of the nodes that need to be provisioned and thus are the most significant factor in 2i2c cloud costs.

The Grafana query used was

Expr: sum(
  kube_pod_container_resource_requests{resource="memory"}
) by (namespace)
Step: 24h0m0s

This allowed computation of the relative fraction of costs incurred per month. Costs associated with the "core infrastructure" (cnrm-system, configconnector-operator-system, kube-system, staging, support) was distributed across the hubs in proportion to their memory requested.

Looking back through the GitHub commits, here are the communities for which we deployed hubs: wageningen, callysto, grenoble, justiceinnovationlab, anu, jackeddy, aup, binder-staging (pangeo?), earthlab, paleohack2021, peddie, pfw, utexas. I have collected information for which months each of these hubs was deployed but only detailed usage data for since July 2022. Going through this "historical" information will require a bit of a brain meld from the team. I propose we tackle that in the Community and Partnership meeting on Thursday (2022-11-24)

AWS
AWS has billing information since November 2021. Our total costs incurred to the end of October 2022 were $1,857.82. Note: this is different than what is shown in the plots at the top of the issue since AWS reports "credits" slightly differently than GCP does. Our total "usage" of AWS to date is $5,744.11 but there were also "credits" of $3,886.29 (all related to uw-hackweeks). Since each hub lived in each own linked-account, we can directly allocate this costs between our community partners uw-hackweeks, nasa-cryo-hubs, uw-gridSST. There is only a tiny $0.08 "2i2c-sandbox" charge we need to absorb internally.

@sgibson91
Copy link
Member

binder-staging should be considered the same as staging and dask-staging on that cluster. It was development work for us to ensure we could do a Binder deployment for Pangeo once the situation regarding contracts/invoices/migrating infra to a 2i2c-managed project has settled down. I think the Pythia group were using it for a while, but that was not under 2i2c advisement.

@choldgraf
Copy link
Member Author

choldgraf commented Nov 22, 2022

This is great - thanks for breaking all of these down!

My feeling is that we should shoot for an "80/20" approach here. We should take the low-hanging fruit for invoicing, and then use the "harder to figure out" invoicing cases to figure out how to automate this, rather than to recover every last penny. I think that's OK even if we leave several thousand on the table - in the long run we'll save much more by saving ourselves the time of doing this semi-automatically. Does that make sense?

@pnasrat
Copy link
Contributor

pnasrat commented Apr 7, 2023

Closing this as duplicate of 2i2c-org/meta#529

@pnasrat pnasrat closed this as completed Apr 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Partnerships Creating and fostering new collaborations with external groups Task Actions that don't involve changing our code or docs.
Projects
None yet
Development

No branches or pull requests

6 participants