Conduct data analysis and data visualization on a dataset about movies.
- Import the movies dataset from the CSV file "movies_complete.csv". Inspect the data.
Some additional information on Features/Columns:
- id: The ID of the movie (clear/unique identifier).
- title: The Official Title of the movie.
- tagline: The tagline of the movie.
- release_date: Theatrical Release Date of the movie.
- genres: Genres associated with the movie.
- belongs_to_collection: Gives information on the movie series/franchise the particular film belongs to.
- original_language: The language in which the movie was originally shot in.
- budget_musd: The budget of the movie in million dollars.
- revenue_musd: The total revenue of the movie in million dollars.
- production_companies: Production companies involved with the making of the movie.
- production_countries: Countries where the movie was shot/produced in.
- vote_count: The number of votes by users, as counted by TMDB.
- vote_average: The average rating of the movie.
- popularity: The Popularity Score assigned by TMDB.
- runtime: The runtime of the movie in minutes.
- overview: A brief blurb of the movie.
- spoken_languages: Spoken languages in the film.
- poster_path: The URL of the poster image.
- cast: (Main) Actors appearing in the movie.
- cast_size: number of Actors appearing in the movie.
- director: Director of the movie.
- crew_size: Size of the film crew (incl. director, excl. actors).
- Filter the Dataset and find the best/worst n Movies with the
- Highest Revenue
- Highest Budget
- Highest Profit (=Revenue - Budget)
- Lowest Profit (=Revenue - Budget)
- Highest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Lowest Return on Investment (=Revenue / Budget) (only movies with Budget >= 10)
- Highest number of Votes
- Highest Rating (only movies with 10 or more Ratings)
- Lowest Rating (only movies with 10 or more Ratings)
- Highest Popularity
3.b. Find the most common words in Movie Titles and Taglines and present the data
- Analyze the Dataset and find out whether Franchises (Movies that belong to a collection) are more successful than stand-alone movies in terms of:
- mean revenue
- median Return on Investment
- mean budget raised
- mean popularity
- mean rating
hint: use groupby()
Create a feature called Franchise that identifies whether or not a movie is a franchise
- Find the most successful Franchises in terms of
- total number of movies
- total & mean budget
- total & mean revenue
- mean rating
- Find the most successful Directors in terms of
- total number of movies
- total revenue
- mean rating
- Find the most successful Actors
- What are the most popular genres?
- What are the most made genres over the years?
- Has this changed overtime?