There are 2 csv files and a subfolder:
•db500.csv
•itemGenres.csv
•treemap
Let's describe each of these elements.
db500.csv is a database with the 500 biggest movie production budgets of Hollywood.
The database was initially made with data from the Numbers, then I added data from wikipedia and IMBd.
Let´s describe the 30 variables:
Release.Date the release Date of a movie in the format: mm/dd/yy from the numbers.
Movie The name of the movie from the numbers.
Production.Budget the production budget in dollars from the numbers.
Domestic.Gross the domestic gross revenue in dollars from the numbers.
Worldwide.Gross the worldwide gross revenue in dollars from the numbers.
Rate the rate that a movie received from imbd.
Raters the number of persons that rated a movie from imbd.
Genres the movie genre from imbd.
imdbUrl the imbd url of the movie.
Directed.by the name of the person that directed the movie.
Produced.by the name of the person that produced the movie.
Written.by the name of the person that write the story of the movie.
Starring the list of actors that participated to the movie.
Music.by the name of the author of the music.
Cinematography
Edited.by the name of the editor.
Production.company the name of the production company.
Distributed.by the name of the distributor company.
Release.date the release date from wikipedia.
Running.time the length of movie in minutes from wikipedia.
Country the country where the movie was produced.
Language the original language of the movie from wikipedia.
Budget the production budget in dollars from wikipedia.
Box.office the box office (worldwide gross revenue) in dollars from wikipedia.
wikiUrl the wikipedia url from wikipedia.
image the url of the movie affiche from wikipedia.
Screenplay.by
Story.by the name of the person that wrote the original story of the movie.
If the story comes from a book, the book author will be displayed.
Based.on the name of the original story.
Production.companies the name of the production company if there is more than just 1.
In the file itemGenres.csv, you will find the different combination of genres that we find in the database db500.csv.
This file has been made to built association rules(you can use the package arules
in R).
In this folder, you will find the different components to display an interactive treemap like this one.
Enjoy this script and if you have any issues to make it work correctly, you can always contact me via my blog or my mail.