Skip to content

srozsnyai/wienerluft

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wiener Luft

This project is for educational purposes. The aim is to demonstrate how to build an end-to-end data integration pipeline incl. a public web facing page with close to zero costs. As a side effect it allows to monitor & analyze Vienna's air quality metrics.

The data source of this project is provided by Stadt Wien – data.wien.gv.at

See deployed example: www.wienerluft.at

Disclaimer

This is a private project for educational purposes. Szabolcs Rozsnyai is an employee Google. The opinions stated here are my own, not those of Google.

Architecture

Architecture

Data Pipeline

The Cloud Scheduler emits every 15min a a pub/sub message to trigger the execution of a Cloud Function. The Cloud Function runs a Java-based application which calls the Stadt Wien API. This API returns the current readings from all the AQ stations. The cloud function parses the returned message and writes them into a BQ table in their raw format (i.e. tabular but untyped).

Storage & Analytics

A BQ view is used to type the individual attributes that have been previously stored as simple string values.

SELECT
       stationId,
       PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:')) as dateTime,
       FORMAT_DATE('%A', PARSE_DATE('%d.%m.%Y', SUBSTR(dateTime,0,10))) AS dayOfWeek,
       FORMAT_DATE('%u', PARSE_DATE('%d.%m.%Y', SUBSTR(dateTime,0,10))) AS dayOfWeekAsNumber,
       EXTRACT(HOUR FROM PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:'))) as hourOfDay,
       EXTRACT(YEAR FROM PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:'))) as year,
       EXTRACT(MONTH FROM PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:'))) as month,
       EXTRACT(DAY FROM PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:'))) as day,
       EXTRACT(WEEK FROM PARSE_DATETIME('%d.%m.%Y, %H:%M',  REPLACE(RTRIM(LTRIM(dateTime)),'24:', '00:'))) as week,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(windSpeed)),',','.') AS NUMERIC) as windSpeed,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(windDirection)),',','.') AS NUMERIC) as windDirection,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(rf)),',','.') AS NUMERIC) as rf,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(no2)),',','.') AS NUMERIC) as no2,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(nox)),',','.') AS NUMERIC) as nox,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(pm10)),',','.') AS NUMERIC) as pm10,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(pm10_24)),',','.') AS NUMERIC) as pm10_24,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(pm25)),',','.') AS NUMERIC) as pm25,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(pm25_24)),',','.') AS NUMERIC) as pm25_24,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(o3)),',','.') AS NUMERIC) as o3,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(o3_24)),',','.') AS NUMERIC) as o3_24,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(S02)),',','.') AS NUMERIC) as S02,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(CO)),',','.') AS NUMERIC) as CO,
       SAFE_CAST(REPLACE(RTRIM(LTRIM(CO_24)),',','.') AS NUMERIC) as CO_24

       FROM PROJECTID.wienerluft.raw
       ORDER BY dateTime DESC 

The view is the source table for the DataStudio report which is embedded in a static webpage. This static webpage is served from a GCP bucket.

Installation

Resources which will be used

  • Cloud Scheduler
  • Cloud Storage
  • Pub/Sub
  • BigQuery
  • Cloud Functions
  • Cloud Build

Preparation

Set PROJECTID in config.sh to desired GCP project-id.

Run initialization (only required once per project)

This will enable all required services

./init.sh

Run installation scripts to provision required resources

cd resources
./create.sh

Deprovision resources

./clean.sh

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published