YouTube Comments Scraping and Analysis

This project involves scraping and analyzing YouTube comments from machine learning-related videos in Bahasa Indonesia. The YouTube API was used to collect comments data efficiently and systematically. The comments are then processed and analyzed using various text representation techniques to extract key insights about popular terms and themes.

Project Overview

Data Source: YouTube comments on machine learning videos in Bahasa Indonesia.
Text Representation Techniques: TF-IDF, One-Hot Encoding, and CountVectorizer.
Analysis Methods: Word frequency analysis and word cloud visualization.

Project Workflow

Data Collection: Comments were scraped from YouTube using the YouTube API.
Data Preprocessing:
- Cleansing: Removed irrelevant text, emojis, and special characters.
- Tokenization: Split comments into individual words.
- Stopword Removal: Removed common stopwords to focus on relevant words.
- Lemmatization: Reduced words to their base forms.
Text Representation:
- TF-IDF: Term Frequency-Inverse Document Frequency to represent text as a numerical feature vector.
- One-Hot Encoding: Representing text where each unique word is a distinct column in a vector.
- CountVectorizer: Counts the frequency of words across the corpus.
Analysis and Visualization:
- Word Frequency Analysis: Identified the most common terms in the comments.
- Word Cloud: Visualized frequent words to reveal popular themes.

Conclusion

This project demonstrates the process of collecting, processing, and analyzing text data from YouTube comments, highlighting various text representation techniques. Through word frequency analysis and word cloud visualization, we uncovered common themes and key terms discussed in the comments, such as "machine learning," "data," and "training." These insights provide an overview of popular topics in Indonesian-language machine learning discussions on YouTube.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
Scrapping Data.csv		Scrapping Data.csv
Youtube_Comment_Scrapping.ipynb		Youtube_Comment_Scrapping.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

YouTube Comments Scraping and Analysis

Table of Contents

Project Overview

Project Workflow

Conclusion

License

About

Releases

Packages

Languages

License

steveee27/YouTube-Comments-Scraping-Analysis

Folders and files

Latest commit

History

Repository files navigation

YouTube Comments Scraping and Analysis

Table of Contents

Project Overview

Project Workflow

Conclusion

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages