Our Motivation:
- Animes are outlets of relaxation and escape for people of all ages. However, while anime viewers love watching anime, studios are experiencing difficulties in making profits for many of the anime they produced due to high costs.
- According to Eric (2015), an average 13-episode anime season costs around $2 million USD, and many animes cannot recoup this expense. In order to make it sell, anime advertisements, events and merchandise are essential to studios’ profit margin. All this depends on the popularity of the anime with anime viewers.
- Hence, it is important to know whether the anime that a studio is producing will be profitable, hence allowing studios to maximise their profits and ensure their survivability in the industry.
- This project aims to maximize studios’ profits on animes they produce by estimating 'mean' rating of animes and predicting 'success' probability before production, hence giving studios the ability to fine-tune the animes before production.
We used MyAnimeList API to scrap anime from 2000 to 2021, cleaned and processed it for Exploratory Data Analysis and Machine Learning.
Note: Some datasets are scraped but are not included in the final project (e.g. the various ranking datasets)
- Data Collection
- Data Cleaning and Preprocessing
- Exploratory Data Analysis & Visualization
- Linear Regression
- Classification
Note: Some Jupyter Notebooks are used but are not included in the final project (e.g. anomaly detection, helpers, scraper)
- Used MAL API to recursively scrap thousands of anime data from 2000 to 2021
- Removing useless features, handling missing values
- Json conversion and manipulation
- Feature engineering and generation
- Creating 'genres' time series data
- Export data as csv
- One-hot Encoding
Explored, visualized, and generated insights for the following:
- 'genres' + 'genres' time series
- 'studios'
- 'mean' rating vs 'source', 'media_type', 'nsfw', 'rating', 'genre', and 'studios'
- Relationship between 'mean', 'rank', 'popularity', 'positive_viewership_fraction', and 'negative_viewership_fraction'
- num_episodes' and 'average_episode_duration' overview trend
- 'start_season_season'
Models:
- Linear Regression
- Lasso Regression
- Ridge Regression (Best)
Metrics:
- Explained Variance (R^2)
- Mean Squared Error, Root Mean Squared Error
Models:
- LinearSVC
- Decision Tree
- Random Forest (Best - 4th version)
Metrics:
- TPR, TNR, Confusion Matrix
- Precision, Recall (TPR), F-score
- Out-of-bag score
- ROC AUC score
- K-fold cross validation standard deviation
Studios should:
- Focus on quality over quantity of animes
- Broadcast animes regardless of season
- Not focus on producing animes that generate more positive views through fan-service
- Produce anime movie franchises
Important features that determine the success of an anime:
- ‘average_episode_duration’
- ‘num_episodes’
- ‘source_manga’
- ‘media_type_movie’
- ‘rating_pg_13’
Data collection:
- Scraping data using API calls
Data cleaning and preprocessing:
- Feature Engineering & Feature generation
- JSON manipulation techniques
- Generating time-series data
EDA & Visualization:
- Visualization plots with large number of datapoints
- By reducing the data point size,
- By reducing the opacity of data points, or
- By introducing random sampling
- ‘genres’ time-series EDA
Machine Learning:
- Machine Learning Models:
- Ridge Regression, Lasso Regression, Random Forest, LinearSVC
- Classification Performance Metrics:
- F-score (Precision & Recall), out-of-bag (obb) score, ROC AUC score
Data Collection: Jing Qiang
and Jing Hua
Data cleaning and preprocessing: Jing Qiang
, Jing Hua
, and YinFeng
EDA and visualization: Jing Qiang
and Jing Hua
Regression: Jing Hua
Classification: Jing Qiang
Presentation Script: Jing Qiang
Presentation Voice Over + Editing: Jing Hua
Slides Deck: Jing Qiang
, Jing Hua
, YinFeng
GitHub ReadMe: Jing Qiang
Did but not included in the final product:
- Ranking dataset EDA:
YinFeng
- Anomaly Detection:
Jing Qiang
,YinFeng
- https://myanimelist.net/apiconfig/references/api/v2
- https://www.animenewsnetwork.com/interest/2015-08-13/anime-insiders-share-how-much-producing-a-season-costs/.91536#:%7E:text=Like%20other%20entertainment%20ventures%2C%20any,yen%20(or%20%242%20million)
- https://medium.com/@cheahwen1997/data-analysis-and-visualization-on-anime-using-pandas-and-matplotlib-1150d6605f5a
- https://towardsdatascience.com/linear-regression-models-4a3d14b8d368
- https://medium.datadriveninvestor.com/choosing-the-best-algorithm-for-your-classification-model-7c632c78f38f
- https://builtin.com/data-science/random-forest-algorithm
- https://www.kaggle.com/code/niklasdonges/end-to-end-project-with-python/notebook
- https://quantdare.com/decision-trees-gini-vs-entropy/