Skip to content

sayan112207/Keyword-Clustering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Keyword Clustering for SEO Optimization

Overview

This project implements keyword clustering to analyze and group high-volume search keywords using TF-IDF vectorization and Agglomerative Clustering. The goal is to improve SEO strategies by identifying related keyword clusters, enhancing content relevance, and optimizing search rankings.

Key Features

  • Data Collection & Processing: Keywords with their respective search volumes are collected and preprocessed into a structured format.
  • TF-IDF Vectorization: Transforms textual keywords into numerical features, capturing the importance of each keyword relative to the dataset.
  • Agglomerative Clustering: Groups similar keywords into clusters based on their TF-IDF representations.
  • Dendrogram Visualization: Hierarchical clustering results are visualized using a dendrogram, making it easier to understand the relationships between clusters.

Installation

  1. Clone the repository:

    git clone https://github.com/sayan112207/Keyword_Clustering.git
    cd Keyword_Clustering
  2. Install dependencies:

    pip install -r requirements.txt

Usage

  1. Download or prepare your dataset of keywords and their search volumes in the format shown below:

    data = {
        "Keyword": ["keyword1", "keyword2", "keyword3", ...],
        "Volume": [1000, 2000, 1500, ...]
    }
  2. Run the clustering script:

    python keyword_clustering.py
  3. The script will output the clusters and display a dendrogram visualization to show the hierarchical relationships between keywords.

Example

import pandas as pd
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
from scipy.cluster.hierarchy import dendrogram, linkage

# Sample data
data = {
    "Keyword": ["star schema", "snowflake schema", "relational schema", "schema markup", "database schema"],
    "Volume": [1200, 1500, 1000, 1300, 900]
}

# Data preprocessing
df = pd.DataFrame(data)
vectorizer = TfidfVectorizer()
X = vectorizer.fit_transform(df['Keyword'])

# Clustering
clustering = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
df['Cluster'] = clustering.fit_predict(X.toarray())

# Visualize dendrogram
linked = linkage(X.toarray(), 'ward')
dendrogram(linked)
plt.show()

Results

  • The script will output the clusters assigned to each keyword.
  • A dendrogram visualization will help to visualize the keyword groupings.

Contributing

Feel free to fork the repository, make changes, and create pull requests. Contributions are welcome!

License

This project is licensed under the MIT License.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published