This repository contains Python scripts and Jupyter notebooks for analyzing YouTube data related to Dragonboat paddling channels. It includes functions to interact with the YouTube Data API to retrieve channel information, video data, activities, and analytics. The analysis aims to gain insights into channel performance, viewer engagement, and content trends.
You can read my analysis in my blog post.
The code is strongly inspired by the great YouTube Tutorials on accessing the YouTube API by Corey Schafer:
Check them out here:
- Getting Started - Creating an API Key and Querying the API
- Calculating the Duration of a Playlist
- Sort a Playlist by Most Popular Videos
- Using OAuth to Access User Accounts
To see what else is possible using look at the respective API references for the YouTube Data API and the YouTube Analytics API.
And please check out and subscribe to my channels on YouTube, the old, non-performing one and the hopefully more successful topic dragonboat channel!
- Create a Google Cloud project, enable the YouTube Data and Analytics APIs for it and create OAuth credentials. When setting up OAuth for the first time, in case you run into issues during the setup of the consent screen, play around with the app names. It seems some names like 'Youtube', 'Google' or similar are not allowed. (Even though the error message I got was that the 'Request was abusive.')
- Python 3.6 or higher
- Jupyter Notebook
- Required Python libraries (specified in
environment.yml
)
api_functions.py
: Python script containing functions to interact with the YouTube Data API.youtube_analysis.ipynb
: Jupyter notebook for analyzing YouTube data regarding Dragonboat paddling channels.
- Clone the repository to your local machine:
git clone <repository_url>
cd youtube-data-analysis
- Create a virtual environment in the current directory with the required dependencies using Conda:
conda env create --prefix ./venv -f environment.yml
Once the environment is created, activate it using:
conda activate youtube-analysis
- Install the remaining required dependencies using pip:
pip install -r requirements.txt
-
In your Google Cloud project, download your credentials and store them as a file named
client_secrets.json
in the root folder of this repository. -
Run the
youtube_analysis.ipynb
notebook to perform data analysis and visualization.
The analysis showed that detailed data on annotations and other things such as demographics is only available in the daily reports which can be scheduled using the YouTube Reporting API. Since one does not manually want to download reports every day, I created an automated cloud-based solution on AWS largely leveraging services which are part of the Free Tier. These are AWS Lambda, DynamoDB, Amazon EventBridge, as well as the Systems Manager Parameter Store for the handling of the Google API OAuth credentials. To see my approach, please see this tutorial page. I have also created a video tutorial on how to set this up. See below.