This project implements keyword clustering to analyze and group high-volume search keywords using TF-IDF vectorization and Agglomerative Clustering. The goal is to improve SEO strategies by identifying related keyword clusters, enhancing content relevance, and optimizing search rankings.
- Data Collection & Processing: Keywords with their respective search volumes are collected and preprocessed into a structured format.
- TF-IDF Vectorization: Transforms textual keywords into numerical features, capturing the importance of each keyword relative to the dataset.
- Agglomerative Clustering: Groups similar keywords into clusters based on their TF-IDF representations.
- Dendrogram Visualization: Hierarchical clustering results are visualized using a dendrogram, making it easier to understand the relationships between clusters.
-
Clone the repository:
git clone https://github.com/sayan112207/Keyword_Clustering.git cd Keyword_Clustering
-
Install dependencies:
pip install -r requirements.txt
-
Download or prepare your dataset of keywords and their search volumes in the format shown below:
data = { "Keyword": ["keyword1", "keyword2", "keyword3", ...], "Volume": [1000, 2000, 1500, ...] }
-
Run the clustering script:
python keyword_clustering.py
-
The script will output the clusters and display a dendrogram visualization to show the hierarchical relationships between keywords.
import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage
# Sample data
data = {
"Keyword": ["star schema", "snowflake schema", "relational schema", "schema markup", "database schema"],
"Volume": [1200, 1500, 1000, 1300, 900]
}
# Data preprocessing
df = pd.DataFrame(data)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['Keyword'])
# Clustering
clustering = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
df['Cluster'] = clustering.fit_predict(X.toarray())
# Visualize dendrogram
linked = linkage(X.toarray(), 'ward')
dendrogram(linked)
plt.show()
- The script will output the clusters assigned to each keyword.
- A dendrogram visualization will help to visualize the keyword groupings.
Feel free to fork the repository, make changes, and create pull requests. Contributions are welcome!
This project is licensed under the MIT License.