Repo directory:
- Projects are split by folders
- Causal regression - notebook
- Causal regression with DoWhy - notebook
- Double machine learning and marginal effects - notebook
- Using Google's
mediapipe
to try simulate a 3D screen - folder - Using Google's
mediapipe
, measure the distance of a face to the screen from a webcam feed - folder - FashionCNN - Convolution neural network for predicting the Fashion MNIST dataset - notebook
- FashionCNN - Batch normalisation layer applied to the above CNN model - notebook
- Autoencoders - Using PCA to compress MNIST images - notebook
- Autoencoders - Using a dense autoencoder to compress MNIST images - notebook
- Implementing an elastic net model in PyTorch - notebook
- Fitting distributions with variational inference - Simple example fitting a Gaussian distribution to data with Pyro - notebook
- Fitting distributions with variational inference - Simple example fitting a beta distribution to data with Pyro - notebook
- Fitting a multimodal beta distribution with Pytorch - notebook
- Fitting a zero inflated Poisson distribution with Pytorch - notebook
- PyTorch: Linear regression to non linear probabilistic neural network - notebook
- TensorflowProbability: Linear regression to non linear probabilistic neural network - notebook
- Trying out
PyTorch Lightning
- notebook - Tensorflow - Do Neural Networks overfit?notebook
- Fitting a normal distribution with tensorflow probability - notebook
- Binary loss functions - Is there a material difference between using
BCEWithLogitsLoss
andCrossEntropyLoss
for binary classification tasks? - No - notebook - Does initialising the output of a neural net to match your target distribution help? - Yes - notebook
- Exploring multi-armed bandit benchmarks - notebook
- Bootstrapping regression coefficients - Confirming theoretical regression coefficient distributions with bootstrapped samples - notebook
- Interaction coefficients regularisation - notebook
- Sequential Bayesian linear regression model - notebook
- Bayesian regression adapting to non-stationary data - notebook
- Binomial regression vs logistic regression - notebook
- Investigating double descent with linear regression - notebook
- Speed of fitting and predict of
neuralprophet
vsfbprophet
- notebook - Can we fit long AR models with
neuralprophet
- notebook
- Dask vs multiprocessing - Comparing the API of dask to multiprocessing for general functions - python
- Parquet datasets - Exporting writing dataframes to partitioned parquet files - notebook
- Data generating functions from drawing data - notebook
- Analysis into European installed energy capacity - notebook
- The Game of Life computed with convolution - folder
- NBA - Analysis into LeBron James playing minutes - notebook
- TFL - Analysis in to the number of bike trips taken per day in London - notebook
- NBA Score Trajectories - Flask app to show scores of a basketball match against time - repo
- NBA Shooting - Kedro data pipelines to plot player scoring probability distributions - repo
- The classic birthday problem - notebook
The various analysis was built in Python 3.
Some projects have their own requirements/environment. The general setup is installed by:
python3 -m venv dataAnalysisEnv
source dataAnalysisEnv/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt
jupyter nbconvert notebook.ipynb --to markdown
This is automated via github actions.
Custom library installed as a dev library for continued development
Use the settings.json
file in the repo
- Deep learning
- Pytorch
- Embeddings
- Tensorflow/pytorch - 1D functions
- FashionMNIST VAE
- Causal inference
- Data validations - great expectations
- Computer vision
- NLP
- Keyword extraction from reviews etc.
- Sentiment analysis
- Gaussian processes
- Bayesian regression
- Recommender systems
- Automatic playlist continuation
- Thompson sampling example
- Quantile regression in pytorch
- Lasso regression
- Dropout better than regularisation?
- Docker
- Build project template repo
- Publish interpret-ml piece
- NBA
- Player position classification model
- Bayesian sequential team rating
- Player VAE - how are players related
- College stats to NBA VAE
- M5/M4 forecasting
- Walmart demand forecasting
- with LightGBM
- Greykite
- https://arxiv.org/abs/2105.01098
- https://towardsdatascience.com/linkedins-response-to-prophet-silverkite-and-greykite-4fd0131f64cb
- Imputation of missing regressors
- Change points in seasonalities
- Quantiles loss
- Utilities for diagnosing
- faster inference
- Autoregressive
- Orbit
- PCA via embedding layer
- NN to predict tempo from song, generate dummy dataset
- NN to predict tab from music sections
- Word embeddings plot with hiplot
- Plot with PCA first and compare with hiplot
- Compare linear regression MC dropout to theoretical results
- Optimal car charging schedule based on energy prices or carbon output
- Media pipe - 3d audio
- Face distance javascript web app with react
- Covid UK plot against time on a map
- Autoencoder using transfer learning?
- what do we use for the decoder?
- MNIST auto-encoder to digit classifier
- Fit a sinusoid to noisy data
- Fourier
- Gradient descent
- MCMC
- Variational inference
- Double dip loss trajectories
- Fitting NNs to common functions (exp etc.), deep vs wide, number of parameters for given error
- Fit a NN to seasonal data with fourier series components
- Causal inference
- DoubleML on heart data to find CATE
- DoubleML on dummy data vs other causal models. How robust are they to model mis-specification and missing confounders?
- Inverse propensity scoring - comparing different methods - manual Inverse Probability of Treatment Weighting, as variance in regression, sample weights, econML based. Do they match?
- Hierarchical models
- Mixed effects model - is it the same as a fixed effects model (lin/log regression) with one hot encoding for the categorical variables + a fixed effect?
- Hierarchical bayesian models - for when we have categorical features with share effects over other features
- Fit with MCMC
- Similarities to ridge regression - only some coefficients are regularised
- Generate data and fit each model
- Ref
- Linear regression = logistic regression, relationship to Linear Thompson Sampling
- Blurred images classifier
- ImageNet based, data augment to blur images.
- Country embeddings - create country embeddings by predicting various macro level metrics (GDP, population etc. in a multi task model), from OHE through a NN. Does the embedding space make sense?
- MovieLens dataset to get title embeddings, find nearest neighbour titles
- Using word2vec to predict similar titles. Train on movies watched. Similar given as titles streamed by the same customer
- Train embedding for movies based on sequential ordering. Predict the next/middle movie.
- Using word2vec to predict similar titles. Train on movies watched. Similar given as titles streamed by the same customer
- Finding similar images in a photo library - given a few examples find similar photos
- Use an image net model. Find new example images, positive and negative. Fine tune the model via a classification task. Predict prob of positive result for unseen images. Use the latent space embeddings to find cosine similarity between images.
- Build small image dataset from cifar 10. Compare models - PCA/logistic regression, CNN, efficientNet, transfer learnt weights
- Build lookup table of image and its compact embedding. Given a new image find the inner product with the other images
- Fourier transform via linear regression on sinusoids. Similar approach with Lasso regression to find compressed sensing approaches, with non-uniform sampling.
- Multi task neural network training
- train a single model to predict multiple ready fields from a single dataset
- A/B test distribution comparison
- We often compare just the means. If we find plot a Q-Q plot is it more informative, bootstrapping would construct confidence intervals
- Non-stationarity with ADAM
- Can Adam optimisers adapt to non-stationary datasets. Therefore does batch ordering make a difference to the model coefficients.
- Compare against batch mode linear/logistic bayesian regression and show that data ordering is irrelevant.
- Beta Bernouli bandit vs logistic regression with no features
- NN multi-row vs multi-column - do they perform similarly?
- Multi horizon forecasting direct method - with shared NN architecture - compare separate models for each horizon with a NN that shares layers. Compare with sequence to sequence models.
- Gaussian process from scratch
- Probabilistic neural networks
- Normalizing flows - model complex distributions with transformations of gaussians
- Can we train an output layer as a gaussian mixture to model complex distirbutions via gradient descent
- Common data science tasks
- Why do we need a model to find relationships? Conditional relationships are easier to define
- Association and causality
- Associations fast - from clustering propensity model predictions and find average features in each group.
- Causality - doubleML etc. to removing confounding features and finding conditional average treatment effects. How this relates to groupby and average.
- rename environment/requirements files to match the notebook
- update blog articles for markdown images
- TFL, Pyro notebooks re-run
- Add year to each analysis link