Skip to content

TjanMichela/Movie-Data-Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Movie-Data-Analysis

Project: Explanatory Data Analysis & Data Presentation (Movies Dataset)

Project Brief

Conduct data analysis and data visualization on a dataset about movies.

Data Import and first Inspection

  1. Import the movies dataset from the CSV file "movies_complete.csv". Inspect the data.

Some additional information on Features/Columns:

  • id: The ID of the movie (clear/unique identifier).
  • title: The Official Title of the movie.
  • tagline: The tagline of the movie.
  • release_date: Theatrical Release Date of the movie.
  • genres: Genres associated with the movie.
  • belongs_to_collection: Gives information on the movie series/franchise the particular film belongs to.
  • original_language: The language in which the movie was originally shot in.
  • budget_musd: The budget of the movie in million dollars.
  • revenue_musd: The total revenue of the movie in million dollars.
  • production_companies: Production companies involved with the making of the movie.
  • production_countries: Countries where the movie was shot/produced in.
  • vote_count: The number of votes by users, as counted by TMDB.
  • vote_average: The average rating of the movie.
  • popularity: The Popularity Score assigned by TMDB.
  • runtime: The runtime of the movie in minutes.
  • overview: A brief blurb of the movie.
  • spoken_languages: Spoken languages in the film.
  • poster_path: The URL of the poster image.
  • cast: (Main) Actors appearing in the movie.
  • cast_size: number of Actors appearing in the movie.
  • director: Director of the movie.
  • crew_size: Size of the film crew (incl. director, excl. actors).

The best and the worst movies...

  1. Filter the Dataset and find the best/worst n Movies with the
  • Highest Revenue
  • Highest Budget
  • Highest Profit (=Revenue - Budget)
  • Lowest Profit (=Revenue - Budget)
  • Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
  • Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
  • Highest number of Votes
  • Highest Rating (only movies with 10 or more Ratings)
  • Lowest Rating (only movies with 10 or more Ratings)
  • Highest Popularity

What are the most common Words in Movie Titles and Taglines

3.b. Find the most common words in Movie Titles and Taglines and present the data

Are Franchises more successful?

  1. Analyze the Dataset and find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies in terms of:
  • mean revenue
  • median Return on Investment
  • mean budget raised
  • mean popularity
  • mean rating

hint: use groupby()

Create a feature called Franchise that identifies whether or not a movie is a franchise

Most Successful Franchises

  1. Find the most successful Franchises in terms of
  • total number of movies
  • total & mean budget
  • total & mean revenue
  • mean rating

Most Successful Directors

  1. Find the most successful Directors in terms of
  • total number of movies
  • total revenue
  • mean rating

Most Successful Actors

  1. Find the most successful Actors

Most Popular Genres

  • What are the most popular genres?
  • What are the most made genres over the years?
  • Has this changed overtime?

About

Explanatory data analysis and data presentation on movies dataset

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published