Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cost Monitoring #3542

Open
evansiroky opened this issue Nov 12, 2024 · 6 comments
Open

Cost Monitoring #3542

evansiroky opened this issue Nov 12, 2024 · 6 comments
Assignees
Labels
infrastructure Maintenance of infrastructure supporting the data warehouse. Product owner is @evansiroky

Comments

@evansiroky
Copy link
Member

User story / feature request

As a Cal-ITP cloud account owner,
I want to have detailed insights into cloud costs,
So that I can know what is causing our bill to get larger each month.

Acceptance Criteria

Perhaps building on top of #3541, have a way to more easily monitor the cost of various things in the pipeline at a finer-grained analysis than what is currently available in the cloud costs dashboard.

Notes

Based on this, hopefully we can reduce our bill or minimize it's growth.

@evansiroky evansiroky added the infrastructure Maintenance of infrastructure supporting the data warehouse. Product owner is @evansiroky label Nov 13, 2024
@erikamov
Copy link
Contributor

erikamov commented Dec 2, 2024

I just saw in BigQuery two recommendations that may help reducing some costs.

1 - Open BigQuery and click the Recommendation button (top right side):

Image

2 - Click in View details:

Image

3 - See two recommendations:

Image

  • Details of the first recommendation:

Image

  • Details of the second recommendation:

Image

@ohrite
Copy link
Contributor

ohrite commented Dec 16, 2024

Paired with @erikamov today to discuss how to control increasing costs over time. Erika will write up a brief summary for review with the group and @evansiroky.

Image

@erikamov erikamov self-assigned this Dec 17, 2024
@erikamov
Copy link
Contributor

The brief summary is here.

@ohrite
Copy link
Contributor

ohrite commented Dec 17, 2024

As discussed at planning, @erikamov will meet with @evansiroky and analysts to figure out the lowest-hanging fruit. After discussion, we want to discuss GTFS-RT validation outputs.

@erikamov
Copy link
Contributor

erikamov commented Dec 18, 2024

Here a list of all buckets and BQ tables with sizes.

We can use this list to define our next steps.

@erikamov
Copy link
Contributor

Image

To start we could change transform_warehouse to not run on Saturdays reducing few days of processing per month.

For the other DAGs that runs every day, is there a need to run on weekends (except the full refresh on Sunday)?

We could also reduce how many times sync_littlepay, unzip_and_validate_gtfs_schedule_hourly, parse_littlepay, parse_and_validate_rt_v2, and airflow_monitoring run per day, to reduce costs.

Change sync_ntd_data_api and sync_ntd_data_xlsx to run before create_external_tables, would not reduce costs, but would be an improvement to have the new scraped data available at the same day.

I don't want to tag anyone during holidays, so we can discuss when everybody is back, but if you see this comment please feel free to add your input.

Complete list and notes to discuss are available here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
infrastructure Maintenance of infrastructure supporting the data warehouse. Product owner is @evansiroky
Projects
None yet
Development

No branches or pull requests

3 participants