This repository demonstrates the advantages of RNNs and CNNs over traditional ML models on time-series data with outliers.
We're going to use transport mode detection task as our running example. Given a time series of sensor data, the goal is to classify each time step with one of the predefined transport modes: walk, car, bike, etc.
For demonstration purposes, we will use only two modes, "walk" and "train", so that our task becomes a binary classification task.
The data is generated synthetically based on common sense assumptions. Outliers in the data represent faulty sensor readings, which often happens in real life (wrong geo-positions, acceleration). For simplicity, a single feature representing the speed of a device is used. The Data generation notebook describes the data generation methodology in detail.
- Data generation: Describes the data and outlier generation methodology with examples.
- Basic Tree model: Modelling the task using decision trees.
- Tree model with multiple time steps: Decision tree that has access to past timesteps.
- CNN models: Modelling the task using a Convolutional neural networks
- Basic CNN model: Modelling the task using a windowed CNN model
- RNN models: Modelling the task using 3 different RNN models.
- Per-sample RNN model: Classify each element in a sequence for the whole sequence.
- Split-window RNN model: Classify each element in a sequence for the part of sequence (for large sequences).
- Split-window stateful RNN model: Keep state between the batches to continue training RNN on large sequences.
- Overlapping-window RNN model: Predict last element in a sequence with windows.
- RNN advanced topics:
- RNN padding and masking: Generating data samples of different sizes. Padding samples in RNN model.
- RNN class weights: TODO: Generating data samples with different class proportions. Class weights in the RNN model.
- RNN truncated back-propagation: TODO: TODO