Project Overview: Overview

In this project, I will analyze a dataset and then communicate my findings about it. I will use the Python libraries NumPy, pandas, and Matplotlib to make my analysis easier.

What did I need to install?

I needed an installation of Python, plus the following libraries:

pandas
NumPy
Matplotlib
seaborn
csv

I recommend installing Anaconda, which comes with all of the necessary packages, as well as IPython notebook.

Why this Project?

In this project, I will go through the data analysis process and see how everything fits together.
I'll use the Python libraries NumPy, pandas, and Matplotlib, which make writing data analysis code in Python a lot easier! Not only that, these are sought-after skills by employers!

What did I Learn?

Have known all the steps involved in a typical data analysis process
Comfortable posing questions that can be answered with a given dataset and then answering those questions
Know how to investigate problems in a dataset and wrangle the data into a format I can use
Have experience communicating the results of my analysis
Able to use vectorized operations in NumPy and pandas to speed up my data analysis code
Familiar with pandas' Series and DataFrame objects, which let me access my data more conveniently
I know how to use Matplotlib to produce plots showing my findings

Project Details: How did I Complete this Project?

Introduction

For the final project, I conducted my own data analysis and created a jupyter file and html file which can be viewed over here to share my findings. I started by taking a look at the dataset and brainstorming what questions I could answer using it. Then I used pandas and NumPy to answer the questions I was most interested in, and created a report sharing the answers. I did not required to use inferential statistics or machine learning to complete this project, but I made it clear in my communications that my findings are tentative. This project is open-ended.

Dataset

TMDb movie data (cleaned from original data on Kaggle)

Overview and Notes

This data set contains information about 10,000 movies collected from The Movie Database (TMDb), including user ratings and revenue.

Certain columns, like ‘cast’ and ‘genres’, contain multiple values separated by pipe (|) characters.
There are some odd characters in the ‘cast’ column. Don’t worry about cleaning them. You can leave them as is.
The final two columns ending with “adj” show the budget and revenue of the associated movie in terms of 2010 dollars, accounting for inflation over time.

Example Questions

Which genres are most popular from year to year?
What kinds of properties are associated with movies that have high revenues?

Review

Use the Project Rubric to review your project. If you are happy with your submission, then you're ready to submit your project. If you see room for improvement, keep working to improve your project!

Supporting Materials

*Investigate a Dataset - Template Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
dataset		dataset
.gitignore		.gitignore
Investigate_a_dataset-TMDb_movie_database.html		Investigate_a_dataset-TMDb_movie_database.html
Investigate_a_dataset-TMDb_movie_database.ipynb		Investigate_a_dataset-TMDb_movie_database.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Project Overview: Overview

What did I need to install?

Why this Project?

What did I Learn?

Project Details: How did I Complete this Project?

Introduction

Dataset

Overview and Notes

Example Questions

Review

Supporting Materials

About

Releases

Packages

Languages

NgweBecky/Investigate_a_dataset

Folders and files

Latest commit

History

Repository files navigation

Project Overview: Overview

What did I need to install?

Why this Project?

What did I Learn?

Project Details: How did I Complete this Project?

Introduction

Dataset

Overview and Notes

Example Questions

Review

Supporting Materials

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages