In this project, we look at various statistics related to the dataset and ask some very intriguing questions on key traits. Here are the questions we try to answer with our script.
a. Which movie earns the most and least profit?
b. Which movie had the greatest and least runtime?
c. Which movie had the greatest and least budget?
d. Which movie had the greatest and least revenue?
e. What is the average runtime of all movies?
f. In which year we had the most movies making profits? (profits of movies in each year)
2. We then move on to answer specific questions like similar characteristics of some most profitable movies such as:
a. Average duration of movies.
b. Average budget.
c. Average revenue.
d. Average profits.
e. Which director directed most films?
f. Which cast has appeared the most?
g. Which genre were more successful?
h. Which month released highest number of movies in all the years?
i. And which month made the most profit?
a. How have movie production trends varied over the years?
b. What are the top 20 highest grossing movies?
c. What are the top 20 most expensive movies?
d. How do budgets correlate with revenues? Do higher budget movies have higher revenue?
e. What run times are associated with each genre?
Python installation is required for this script to run. You can go to python.org to get python. You can also use Anaconda for this. Python 3.x is recommended for this.
After installing python, you need to install the packages. To do that, navigate into the project directory and open a
command prompt and type this command pip install -r requirements.txt
and this will install all the necessary
packages. If you want, you can also create a virtual environment and install the packages in that environment
itself. There are a number of ways this can be achieved and can be easily found online.
Now to actually run the script, in the command prompt type python data_analysis.py
. However, it is highly
recommended that you use a text editor or an IDE to run the script. Personally, I use PyCharm from Jetbrains. It is
a very powerful IDE. You can also use Visual Studio, Visual Studio Code, Spyder(this is installed when you install
Anaconda) or any other editor or IDE you like.
I have referred a lot of sources while working on this project. Some of them includes:
- Stackoverflow
- Documentation of packages used (pandas, matplotlib, seaborn)
- Udacity course lessons
If you find any issues with the script or any general issue, kindly file them on Issues
If you would like to contribute or want to make any changes, you can submit a Pull Request and I will make sure to follow up.
This repo is shared under Apache Licence 2.0