This is a project repo for my Data Clustering class, phase 1 being an implementation of the K-Means algorithm. Phase 2 is normalization and initialization, where attributes are normalized using min/max normalization and clusters are initialized with random initial clusters instead of random initial centers. Phase 3 is internal validation, using Calinski-Harabasz and Silhouette Coefficient indices for finding optimal number of clusters. Phase 4 is external validation, using the Rand Statistic, Jaccard Coefficient, and Fowlkes-Mallows index for finding the best partition.
-
Notifications
You must be signed in to change notification settings - Fork 0
JoshSample/Data-Clustering-Project
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Topics include: K-means algorithm, initialization methods, normalization methods, internal validation and external validation.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published