Skip to content

Search the transcripts of any channel on YouTube for keywords or phrases in your terminal with python.

Notifications You must be signed in to change notification settings

olincollege/youtube-transcript-search

Repository files navigation

YouTube Transcript Search

Search the transcripts of any channel on YouTube for keywords or phrases in your terminal with python.

Built With

Getting Started

Clone the Repo

Run the following to clone this repo:

git clone https://github.com/olincollege/youtube-transcript-search

Install Dependencies

Install the packages used in this project by running the following:

pip install -r requirements.txt

API Key

  1. Create a free Google Cloud project at [https://console.cloud.google.com/projectcreate]
  2. Enable the YouTube Data API V3 for your project at [https://console.cloud.google.com/apis/library/youtube.googleapis.com]
  3. Create an API key for your project at [https://console.cloud.google.com/apis/credentials]
  4. Copy API key to keyboard
  5. In root directory of repo, add a file named .env and add the line: YOUTUBE_API_KEY=<my-api-key> replacing <my-api-key> with the key that you copied.

Usage

Navigate to the repository directory in terminal and run python run_transcript_search.py or python3 run_transcript_search.py.

The program requires transcript data to be downloaded locally before it can search. Follow the prompts to either search existing channels or download a new one (Note: make sure channel names are spelled exactly as they appear on YouTube). Channels with a lot of data may take some time to download for the initial search.

To search, enter comma separated keywords or phrases. These strings (with various versions of capitalization) will be searched for in the transcript data.

Results are scored based on the number of occurrences of all keywords within a video. Only the top 5 videos are displayed by default, however this can be easily modified in the draw_results method under the ViewTerminal class.

the transcript_data directory

When a new channel's transcript data is downloaded, a new directory with the channel name is created under transcript_data. To delete a channel from local memory, simply delete the channel's directory. Do not delete transcript_data itself.

Testing

This branch, main, is not configured to run tests on our code with pytest. It is identical to testing at its core, but testing comes with a few sets of channel data already downloaded, along with the pytest files themselves. To run the tests, run the following in the root of the testing branch.

pytest *.py

For the primary release version of the program, please use main.

License

GNU GPL v3

License: GPL v3