-
Notifications
You must be signed in to change notification settings - Fork 566
Description
The current face clustering implementation performs a full global re-clustering of all face embeddings every 24 hours (or when the number of unassigned faces exceeds a threshold).
This process loads all face embeddings into memory, runs DBSCAN on the entire dataset, deletes all existing clusters, and reinserts new ones.
This design does not scale and will inevaitably crash or freeze the application as the user’s photo library grows.
Location
- File:
backend/app/utils/face_clusters.py - Function:
cluster_util_face_clusters_sync - Trigger:
- Automatic 24-hour re-clustering
- Manual API call:
/global-recluster
Current Logic
if time_since_last_reclustering > 86400 or unassigned_faces > 100:
results = cluster_util_cluster_all_face_embeddings() # Loads ALL embeddings into RAM
db_delete_all_clusters(cursor) # Deletes all clusters
db_insert_clusters_batch(results)Impact
1. Out of Memory (OOM) Risk
Loading all embeddings (e.g., 50,000+ faces) into memory causes large RAM spikes.
DBSCAN performs O(N²) distance computations, making crashes inevitable at scale.
2. Service Blocking / Self-DoS
The clustering runs synchronously, blocking background workers.
API requests will time out while the backend continues heavy computation.
3. Data & UX Instability
Daily full re-clustering causes clusters to shift, split, or merge unexpectedly.
User-assigned names or manual merges can be lost, damaging user trust.
#Proposed Improvements
-
Incremental Clustering
Assign new faces to existing clusters instead of reprocessing the entire dataset. -
Background Execution
Move /global-recluster to an async/background task to avoid blocking the API. -
Batch / Chunk Processing
Process embeddings in chunks to avoid RAM spikes. -
Cluster Stability Guarantees
Preserve existing clusters and user labels wherever possible.
Severity
Severity: 1 (Critical)
This is an architectural scalability issue that can:
Crash the application (OOM)
Block services for extended periods
Corrupt user organization over time