Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I have orphan persons (without faces), and always recreates the group. #137

Closed
matiasdelellis opened this issue Apr 5, 2019 · 14 comments
Closed
Assignees
Labels

Comments

@matiasdelellis
Copy link
Owner

matiasdelellis commented Apr 5, 2019

Graphic description

imagen

Just that.. 😞

Query to test:

SELECT * FROM `oc_face_recognition_persons` P
WHERE NOT EXISTS(SELECT * FROM `oc_face_recognition_faces` F
                 WHERE F.person = P.id)

Response here:

id user name is_valid last_generation_time linked_user
4736 matias Vicky 0 2019-03-29 16:59:53 NULL
4737 matias Sofy 0 2019-03-29 16:59:53 NULL
5114 matias Matias De lellis 0 2019-04-01 16:48:39 NULL

Surely it is due to so many tests of the change of the sensitivity parameter, but I still can not reproduce to document it correctly

@matiasdelellis
Copy link
Owner Author

matiasdelellis commented Apr 5, 2019

I still do not see how they were orphans, but now I understand why they are never erased.

The current clusters are formed from the faces. (Without taking into account the persons table):

https://github.com/matiasdelellis/facerecognition/blob/master/lib/BackgroundJob/Tasks/CreateClustersTask.php#L203

As you identify which one should be removed from this list, you will never be able to eliminate the orphans:
https://github.com/matiasdelellis/facerecognition/blob/master/lib/Db/PersonMapper.php#L166

@stalker314314
Any ideas?

p.s:. It's the opposite of orphan.. but I do not know how to say it.. 😅

@stalker314314
Copy link
Collaborator

Hard problem:) To be honest, I have no idea how this can happen. Couple of things:

  • What you suggested (with getCurrentClusters), I think, is not true, as there should be no orphan persons to begin with! So, if there was orphan person, they will reappear. If there was no orphans, getCurrentClusters should return no orphans. THis might keep error going, but it was not root cause.
  • I don't know where problem is, but I suspect it to be in mergeClusterToDatabase (I already fixed nasty bug there ff2ddc4 and it is completely untested, in contrast to rest of CreateClustersTask)
  • What we can always do, and I have no problem doing it is to create new task, named DeleteOrphanPersonsTask that can go and remove any persons that do not have any faces. I know this is sweeping bug under the rug, but if we acknowledge that our DB schema design is the way it is, and that bugs will always creep in, this might be wise thing to do, in the long run. What do you think?
  • Also in the realm of "sweeping bugs under the rug", we might ignore persons with 0 faces in UI?
  • I plan to create integration tests (as complex as possible) that exercise mergeClusterToDatabase in the hope I can figure out edge cases leading to orphans
  • If this doesn't work, I will have to stress whole of CreateClustersTask (with some predictable random generator seed, or with predictable chinese whispers) in the hope I can catch it when I change sensitivity.

@stalker314314
Copy link
Collaborator

As you probably saw, I tried with simple approach - #138. Plan is to merge that anyway, even though there was no results.

My next plan is to create stress test and try to reproduce bug in that way (like create 10.000 faces with random descriptors and change sensitivity and see if, after each iteration, we have orphans).

If that doesn't help, I vote that we add simple DeleteOrphanPersonsTask (or even simple 5 lines to delete orphan immediately after CreateClustersTask is done).

One additional thought:

  • could be that somehow there was addition of a new image (in callbacks) causing this? I though about this, but I cannot connect it to problem
  • your background task didn't crash along the way?:) Just checking:)

@matiasdelellis
Copy link
Owner Author

Hard problem:) To be honest, I have no idea how this can happen. Couple of things:

* What you suggested (with getCurrentClusters), I think, is not true, as there should be no orphan persons to begin with! So, if there was orphan person, they will reappear. If there was no orphans, `getCurrentClusters` should return no orphans. THis might keep error going, but it was not root cause.

Of course. It is clear that it is not the cause .. Just point out the only thing that I found .. 😞

* I don't know where problem is, but I _suspect_ it to be in `mergeClusterToDatabase` (I already fixed nasty bug there [ff2ddc4](https://github.com/matiasdelellis/facerecognition/commit/ff2ddc4db12d6fa7949173e6962dd52ead18d0ac) and it is completely untested, in contrast to rest of `CreateClustersTask`)

This commit was already in master when I did all the tests..

* What we can always do, and I have no problem doing it is to create new task, named `DeleteOrphanPersonsTask` that can go and remove any persons that do not have any faces. I know this is sweeping bug under the rug, but if we acknowledge that our DB schema design is the way it is, and that bugs will always creep in, this might be wise thing to do, in the long run. What do you think?

I prefer to find out what the problem is, rather than silently ignore it..

* Also in the realm of "sweeping bugs under the rug", we might ignore persons with 0 faces in UI?

I am thinking of adding some 'test mode' so that administrators have more tools in the fonted .. Maybe I can add this to the common user while we investigate..

* I plan to create integration tests (as complex as possible) that exercise `mergeClusterToDatabase` in the hope I can figure out edge cases leading to orphans

Great, I'm seeing it ..

* If this doesn't work, I will have to stress whole of `CreateClustersTask` (with some predictable random generator seed, or with predictable chinese whispers) in the hope I can catch it when I change sensitivity.

I forgot to document, but the test that I did the last time was, reset the database, and run the command with a timeout of 90 seconds, and a very small maximum area, to analyze quickly, and observe the evolution of the clusters.
I will try to do it again, but giving more attention to these details.

@matiasdelellis
Copy link
Owner Author

As you probably saw, I tried with simple approach - #138. Plan is to merge that anyway, even though there was no results.

Agree ... We have to have it even if they are simple cases .. 😉

My next plan is to create stress test and try to reproduce bug in that way (like create 10.000 faces with random descriptors and change sensitivity and see if, after each iteration, we have orphans).

If you advance, welcome, but continue with what you have planned. 😄
I'm going to try to reproduce, because it's an isolated case on my server... 😞

If that doesn't help, I vote that we add simple DeleteOrphanPersonsTask (or even simple 5 lines to delete orphan immediately after CreateClustersTask is done).

In any case, I prefer to hide it from the user, but not add another task that eliminates errors that we can not even reproduce ...

One additional thought:

* could be that somehow there was addition of a new image (in callbacks) causing this? I though about this, but I cannot connect it to problem

I can not assure it, but I think no image was added. Only the complete collection, as I described in the previous comment.

* your background task didn't crash along the way?:) Just checking:)

No. Never crash.

@matiasdelellis
Copy link
Owner Author

I repeated the same procedure, and I still not reproduce this problem again.

  • sudo -u apache php occ face:reset
  • WHILE (THERE ARE PENDING PHOTOS)
    • sudo -u apache php occ face:background_job -u matias --max_image_area=250000 -t 90
    • CHECK THE PROBLEM MANUAL BY LOOKING AT THE DATABASE.
    • SPORADICALLY, RENAME THE GROUPS IN THE FRONTED TO SEE HOW THEY EVOLVE..
  • CHANGE THE SENSITIVITY IN THE FRONTED TO 0.4
  • sudo -u apache php occ face:background_job -u matias --max_image_area=250000 -t 90
  • RENAME THE GROUPS AGAIN..
  • CHANGE THE SENSITIVITY IN THE FRONTED TO 0.5
  • sudo -u apache php occ face:background_job -u matias --max_image_area=250000 -t 90

It takes 4 hours, when normally it takes 18 hours with the normal parameters .. and the result is acceptable .. 😅

Well, I'm not going to close this issue, but it seems to be an isolated case..

p.s: I had SMART problems on the main disk of my home server. 😞 😥
Luckily I was able to make backups.. 😌 , however, I have everything in maintenance mode until I have a replacement and install the system. 😞

@stalker314314 stalker314314 self-assigned this Apr 11, 2019
@stalker314314
Copy link
Collaborator

Yes, your exact procedure is how I envisioned stress test to work:D I still think this stress test would be nice addition (even if it does not uncover any issues). I also think it should not be blocker for beta release on app store:) So, yes, expect me to continue working on this (but, it might be only in May, due to business trips)

@matiasdelellis
Copy link
Owner Author

matiasdelellis commented Nov 18, 2019

Hi @stalker314314
I finally know how to reproduce it.. 😅

Steps to reproduce:

  1. Analysis completed.
    • Found 0 faces without associated persons for user matias and model 1
      Found 0 changed persons for user matias and model 1
      Clusters already exist, estimated there is no need to recreate them```
      
  2. Select a photo that includes faces with single clusters! (Cluster with just one face)
  3. Modify the photo, for example delete location information on Maps
    • This invalidates the person and the photo.

With this there is already an empty person! 😞
Invalidates the person, and mark the photo to analyze again. But since this face no longer belongs to the person, the person is empty.

  1. Run the analysis again.
    • Found 1 changed persons for user matias and model 1
      1777 faces found for clustering
      844 persons found after clustering```
      
    • Detect the empty face and analyze the photo again.
    • Image scaled from 3024x4032 to 1086x1448 (since max image area is 1572864 pixels^2)
      Faces found: 2```
    • (Note that still not grouping the new face.)
    • I don't know why I detect two faces... but it does not affect the report .. 😅
  2. Run the analysis again.
    • This should group the new faces.
    • Oldest face without persons for user matias and model 1 is from 2019-11-18 22:30:47
      Found 1 changed persons for user matias and model 1
      1779 faces found for clustering
      842 persons found after clustering```
      
  • Also found the invalid group that is empty.
  1. Run the analysis again... and again and again.
  • Keep detecting the empty person ..
  • Found 1 changed persons for user matias and model 1

Summarizing .. When a photo is invalidated, the face of the person is separated, but it is not verified that the person is left with any face, which in case of being empty, should be deleted..

@matiasdelellis
Copy link
Owner Author

Well .. Maybe it's much easier to reproduce .. 😅

Steps:

  1. Select a photo that includes faces with single clusters! (Cluster with just one face)
  2. Just delete this photo..

Again, the person is invalidated, and all faces are removed (and in this case also removed from the images table).
As the faces of this photo were unique (In my case two statues.), The person has no other face and is empty..

@matiasdelellis
Copy link
Owner Author

In both cases, the hooks act, and the code is similar..

postWrite:
https://github.com/matiasdelellis/facerecognition/blob/master/lib/Watcher.php#L171-L175

$this->imageMapper->resetImage($image);
// note that invalidatePersons depends on existence of faces for a given image,
// and we must invalidate before we delete faces!
$this->personMapper->invalidatePersons($imageId);
$this->faceMapper->removeFaces($imageId);

...and postDelete:
https://github.com/matiasdelellis/facerecognition/blob/master/lib/Watcher.php#L228-L232

$this->personMapper->invalidatePersons($imageId);
$this->faceMapper->removeFaces($imageId);
$image->setId($imageId);
$this->imageMapper->delete($image);

Therefore we just have to improve these lines .. 😃

@stalker314314
Copy link
Collaborator

Ah, I see, refcounting problem - we should drop person when there is no faces associated to it? There are two ways to fix this:

  1. Change logic in Watcher, so it is a) fetching all faces to be removed (and their persons), b) delete faces, c) iterate for all those previously fetched persons and check if they have 0 faces for it and delete those persons.
  2. Another approach is to keep Watcher as-is and to add:
    • Logic to ignore empty persons in viewer (to fix problem in viewer, which is just problem in presentation layer)
    • Fix logic in cluster creation to delete empty clusters (or maybe even separate "task for it)

Problem with approach 1 is that it:

  • complicates watcher (we want to be as reliable as possible here, so less code is better),
  • it can have N queries (it can be slower), where N is number of faces on modified/deleted image, and
  • it can lead to concurrency issues (imagine situation where we are deleting file and constructing cluster at the same time).

I can go with both approach, but I favor 2. What is your thinking @matiasdelellis?

@matiasdelellis
Copy link
Owner Author

Ah, I see, refcounting problem - we should drop person when there is no faces associated to it?

Yes.. 😄
Great your definition of reference counting.. 😬
Still I could not think how to define it.. 😅

Maybe I prefer more for option 1, to have the problem contained, it is fixed when it occurs.. On the other hand, option 2 is to fix any clean person we find in other stage, without knowing why it happened..

complicates watcher (we want to be as reliable as possible here, so less code is better),
it can have N queries (it can be slower), where N is number of faces on modified/deleted image, and.

We just have to add N queries like this one. I don't think there are any performance problems.

DELETE
FROM `oc_face_recognition_persons`
WHERE id = 11678 AND
(NOT EXISTS (SELECT *
FROM `oc_face_recognition_faces`
WHERE person = 11678))

it can lead to concurrency issues (imagine situation where we are deleting file and constructing cluster at the same time).

Today we already have a problem like this... What happens with the faces that eliminate while grouping? 😕

@stalker314314
Copy link
Collaborator

I added PR for approach 1: #178

I still don't think this is better approach:

We just have to add N queries like this one. I don't think there are any performance problems.

Maybe I expressed myself wrongly, it is not perf problems, but perf impact. And adding N queries is (by itself) performance overhead that didn't exist before:)

Today we already have a problem like this... What happens with the faces that eliminate while grouping?

Well, that doesn't mean we should add more concurrency issues (just because there are some today) :)

Also, adding stuff in Watcher is always risky, as we don't have that covered in tests (yet), so I didn't write any tests for this PR. But, another reason not to push stuff to Watcher.

If you ever change your opinion and think approach 2 could be better, we can easily add another task and fix for presentation logic, and then we can revert this change:)

@matiasdelellis
Copy link
Owner Author

I added PR for approach 1: #178

Wow.. Thanks!.

I still don't think this is better approach:
Well, that doesn't mean we should add more concurrency issues (just because there are some today) :)

You are absolutely right, but the current problem I think is potentially much more serious than what we add.
Now half of the data is deleted, and at least one table is left with invalid data. With the second approach it would only be fixed as an additional cleaning task where we will not know if we are not cleaning another problem.. But while the task is not executed, the data is invalid. and ideally they should never be invalid.

Also, adding stuff in Watcher is always risky, as we don't have that covered in tests (yet), so I didn't write any tests for this PR. But, another reason not to push stuff to Watcher.

I was just seeing that we probably have to continue extending the hooks, to add '\OCA\Files_Trashbin\Trashbin', 'post_restore'.. because when you restore a deleted file seems that not emit 'postWrite' 😞

If you ever change your opinion and think approach 2 could be better, we can easily add another task and fix for presentation logic, and then we can revert this change:)

Well.. You can convince me to use approach 2 if we solve the current problems. Instead of deleting so much data, we could look for a suitable way to mark them to eliminate during the analysis.

stalker314314 added a commit that referenced this issue Nov 19, 2019
…oach-1

Deletes persons without faces on change, fixes #137
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants