Replies: 2 comments 1 reply
-
Just commenting on the HDBSCAN part (if I understood your concerns correctly): At present clustering is done on a per-user basis so in that sense #622 doesn't change anything (as compared to the existing DBSCAN implementation). And, yes, IMO using the same clustering model for multiple users would be a privacy concern (even if it would aid in getting more robust clusters). With my comment on #622 (i.e. "feeding in already clustered detections") I referred to using the same user's photos. This would be necessary if HDBSCAN is run incrementally on only the unclustered faces of a user since, unlike DBSCAN, HDBSCAN is not given any "ground truth" on what size or shape a face cluster should be. Instead HDBSCAN learns the optimal cluster limits/sizes from the data. For that reason, it would make sense to mix known good data with potentially very noisy data (i.e. unclustered faces). In the optimal case HDBSCAN would always be run on a full dataset, but the version of HDBSCAN in #622 doesn't yet scale very well (big-O n^2) and hence the desire to use it in some sort of incremental fashion. However, I'm already pretty far along in rewriting HDBSCAN such that it stores some of the computationally expensive interim results in the DB which should make full HDBSCAN runs reasonably fast (~minutes) well into the 10s or even 100s of thousands of faces. But we'll see; I'm just (a loose cannon) working on the core HDBSCAN implementation. Marcel is in the best position to answer on the other concerns and what he has in mind with regard to how to best utilize HDBSCAN. |
Beta Was this translation helpful? Give feedback.
-
I'm curious: Is this with the photos app or with memories? I can't think of a reason this would be a problem with the photos app. |
Beta Was this translation helpful? Give feedback.
-
This is a two-part question/discussion:
My wife and I share a Nextcloud instance with lots of our photos -- Recognize is great for organizing and tagging them. My photos are stored under my Nextcloud account, and hers are stored under her Nextcloud account. Many of the photos are shared with each other via the standard Nextcloud sharing tools. However, I now have the situation where I have duplicate people/face clusters which cannot be combined because one cluster is built with files I own while the other is built from files from my wife's account. I don't have permission to merge her cluster into mine, and if she tries to merge her cluster into mine, it just returns an error.
What exactly is the permissions model? Do users "own" face clusters? What happens if the underlying files are shared? Is there a way to "share" the face cluster itself? Is there a way to mark that two clusters are actually the same person, even though the clusters are "owned" by different accounts?
Furthermore, is there a reason to keep the clusters per user, for privacy reasons? Is there information that leaks if the clusters were instance-wide, but filtered to show only faces which the user has permission to view?
I'm worried about the interaction with #622 if every face gets duplicated -- do we lose some robustness? Conversely, do new privacy implications arise if we start training a model on all users' (of a server) photos?
Beta Was this translation helpful? Give feedback.
All reactions