This was the first Udacity data analyst project I worked on where I investigated the TMDB Movie Dataset and communicated my findings.
The csv data set contains information on 10,000+ movies from 1960 through to 2015.
I used the following libraries to analyse the dataset:
- Pandas
- Numpy
- Matplotlib
The questions I looked to answer going into the analysis are as follows:
- What were the most produced genres from year to year?
- Which film studios have generated the most revenue(adjusted)?
- What kinds of properties are associated with movies that have high revenues?
- Do we see a relationship between revenue vs movie ratings and budget(adjusted)?
- How has annual revenue(adjusted) changed year on year? Is it trending upwards?
- Comedy and drama ranked as the most produced genre, year to year more than 70% of the time during the timeframe (1960 - 2015).
- Universal was the leading film studio in revenue adjusted over the date period (1965 - 2015) versus competitors; Disney, Paramount & Columbia.
- High grosing movies were often to more likely have a higher budget. High grossing movies often had a runtime between 100 to 200 minutes. High grossing were more likely to have a higher vote average. High grossing movies typically had higher popularity scores.
- There appears to be a positive correlation between vote average and revenue adjusted. There appears to be a positive correlation between budget adjusted and revenue adjusted.
- There is a clear trend which shows that revenue (adjusted) has been growing year on year within the movie industry.