We have always been fascinated about sports and stats. Arguments on the outcome would not end until the game actually got over. There would always be one stat that could prove that the underdog still had a chance. FiveThirtyEight (http://fivethirtyeight.com/) is a great example of sports journalism done right. See:
- http://fivethirtyeight.com/datalab/roberta-vincis-upset-of-serena-williams-is-the-biggest-in-modern-womens-tennis-history/
- http://fivethirtyeight.com/features/serena-williams-and-the-difference-between-all-time-great-and-greatest-of-all-time/
UCI has a great repository of datasets. We are sure you would have heard of them if not used some of them. Here's one on the 2013 Tennis Tournaments https://archive.ics.uci.edu/ml/datasets/Tennis+Major+Tournament+Match+Statistics
Your task is to create 3 hypotheses based on the above dataset and validate or invalidate them. In a short jupyter notebook detail out your hypothesis, analysis process, experiments, modelling and final conclusion. Include relevant graphs and code.
Remember we are going to be looking for the following in your work:
- How you think and how creative you are
- How familiar you are with the science and art of data analysis
- How you code - structure, comments, and efficiency
- How well you can communicate your process and conclusions
- And how quickly can you learn.
Feel free to use R/Python/Javascript or any language supported by the jupyter notebook. If you are going for the kill and you really want to wow us:
- Use Julia http://julialang.org/
- Add additional datasets, build out another hypothesis which make your previous ones look trivial or boring.
Fork this repository. When you are done, send us a link to your jupyter notebook. We would like you to come in and present your code to the whole team.
May the force be with you!