In this repo I share the code of my talk at databeers London in March 2017.
Databeers (Twitter @databeers or @databeersldn) is an international series of short talks about some (big) data stories happening in more than 20 cities worldwide. For my talk I've been streaming around 10M containing the word beer in different languages and analysed the data. My work includes streaming and cleaning of the data, preliminary analysis and some machine learning models including logistic regressions and random forest for a sentiment analysis and a word2vec model for finding how words are connotated in a beer-context. The code is in Python using common data science libraries like pandas, sklearn or gensim (for word-models).
If you're interested in getting (a sample of) the data please contact me.