Skip to content

This repository is part of Udacity Data Analysis Nanodegree Program. In this project, I analyse a TMDB data set to answer various questions. See README for more details.

License

Notifications You must be signed in to change notification settings

agpt8/TMDB-Analysis

Repository files navigation

TMDB-Analysis

In this project, we look at various statistics related to the dataset and ask some very intriguing questions on key traits. Here are the questions we try to answer with our script.

1. We first need to answer some general questions about the dataset such as:

a. Which movie earns the most and least profit?

b. Which movie had the greatest and least runtime?

c. Which movie had the greatest and least budget?

d. Which movie had the greatest and least revenue?

e. What is the average runtime of all movies?

f. In which year we had the most movies making profits? (profits of movies in each year)

2. We then move on to answer specific questions like similar characteristics of some most profitable movies such as:

a. Average duration of movies.

b. Average budget.

c. Average revenue.

d. Average profits.

e. Which director directed most films?

f. Which cast has appeared the most?

g. Which genre were more successful?

h. Which month released highest number of movies in all the years?

i. And which month made the most profit?

3. We also analyse some trends and relations among some traits like:

a. How have movie production trends varied over the years?

b. What are the top 20 highest grossing movies?

c. What are the top 20 most expensive movies?

d. How do budgets correlate with revenues? Do higher budget movies have higher revenue?

e. What run times are associated with each genre?

How to run

Python installation is required for this script to run. You can go to python.org to get python. You can also use Anaconda for this. Python 3.x is recommended for this.

After installing python, you need to install the packages. To do that, navigate into the project directory and open a command prompt and type this command pip install -r requirements.txt and this will install all the necessary packages. If you want, you can also create a virtual environment and install the packages in that environment itself. There are a number of ways this can be achieved and can be easily found online.

Now to actually run the script, in the command prompt type python data_analysis.py. However, it is highly recommended that you use a text editor or an IDE to run the script. Personally, I use PyCharm from Jetbrains. It is a very powerful IDE. You can also use Visual Studio, Visual Studio Code, Spyder(this is installed when you install Anaconda) or any other editor or IDE you like.

Sources referred

I have referred a lot of sources while working on this project. Some of them includes:

  • Stackoverflow
  • Documentation of packages used (pandas, matplotlib, seaborn)
  • Udacity course lessons

Issues

If you find any issues with the script or any general issue, kindly file them on Issues

Contributions

If you would like to contribute or want to make any changes, you can submit a Pull Request and I will make sure to follow up.

Licence

This repo is shared under Apache Licence 2.0

About

This repository is part of Udacity Data Analysis Nanodegree Program. In this project, I analyse a TMDB data set to answer various questions. See README for more details.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages