This is my final project about the DataTalksClub - Data Engineering Zoomcamp - 2023 Cohort My purpose is to analyze Brazilian Weather Data, managed by Instituto Nacional de Meteorologia (INMET).
INMET is a Brazilian government org which monitor weather data in almost 500 cities across whole country.
In my analysis, I will show data about weather and temperature, around different states in the country. I also will show how is the raining distribution along the years and seasons.
Some insights I will show:
- What is the raining distribution in summer along the last 10 years.
- Raining distribution by country region.
- Total raining by state and year.
- Which station has the highest and the lowest temperature across the months.
My project consists in batch pipeline. It'll download a Zip file for each year, from INMET's website.
Then each file will be extracted and all CSVs whom are inside Zip, will be converted to a Parquet file. So, it'll have one Parquet file for each year.
Then, all Parquet files will be uploaded to Google Cloud Storage, and a External Table in Big Query will link to its folder.
After that, the External Table will be processed by dbt to generate final data.
And using Looker Studio, we can see a nice view for all aggregate data.
- Google Cloud Platform(GCP): providing infrastructure for cloud computation, data lake storage and warehouse solution.
- Prefect: to workflow and execute Python code following schedule definition.
- Python: custom code with famous DE Libraries, like Pandas, to Extract and Load the data.
- Terraform: to create and manipulate GCP resources using commands / CLI.
- dbt: solution to Transform data inside the BigQuery and others warehouses.
- Looker Studio: my dashboard solution, this is the final step of all data.
Please follow to this page for instructions.