This repo includes the raw data and scripts used to condition raw data and EPA data, pull weather API data, format and combine it, run logistic regression machine learning algorithms on it, and create figures.
check out chapter 7 of LearnAir, a master's thesis for plots and interpretation of results.
code written for LearnAir includes:
- chainCrawler and chainSearcher - a web crawler and a breadth-first-search tool for the semantic web data achitecture ChainAPI
- chainTraverser and chainDataPush - a stateful web spider to traverse, upload, modify, and interact with ChainAPI nodes and data, including pushing data from Excel files
- chainProcessor - a scalable machine learning crawler framework, which automatically crawls and downloads data from a list of 'known' device types in ChainAPI, processes their data using a device-specific model (that automatically updates when new data is found), and uploads that processed data back into ChainAPI
- an Air Quality Ontology Adaptation of Chain API (original tool written by Spencer Russel et al) - air quality data ontology written with ChainAPI- a semantic web, RESTful Sensor API
Additional resources include:
- the thesis document (full documentation/motivation, esp. Chap 6. ChainAPI for Air Quality)
- the repo for the thesis document
- jupyter notebooks used in data pre-processing, machine learning, and plot generation(with raw data)
- a quick video introducing the learnAir concept
- the original ChainAPI project