Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to merge hierarchies and data under duplicate contacts #373

Closed
kennsippell opened this issue Jan 13, 2021 · 10 comments
Closed

Ability to merge hierarchies and data under duplicate contacts #373

kennsippell opened this issue Jan 13, 2021 · 10 comments
Assignees
Labels
Type: Feature Add something new

Comments

@kennsippell
Copy link
Member

kennsippell commented Jan 13, 2021

Is your feature request related to a problem? Please describe.
In https://github.com/medic/config-moh-mali/issues/181 the moh-mali supervisor app has 215 duplicate places, each with reports and data under them. It is fairly easy to identify the duplicates, but merging them is not straight-forward. It would be great to have a tool that could handle this scenario to merge all information under one contact and eliminate duplicates.

Describe the solution you'd like
Potentially a new "merge-contacts" action or similar? Maybe a --merge-duplicates mode for move-contacts?

When merging two contacts

  1. They probably need to be the same contact_type?
  2. The main contact should get assigned all reports which are currently assigned to duplicate contact
  3. The hierarchy under contact should also be moved
  4. Unclear if the primary contacts should also be merged?
@kennsippell kennsippell added the Type: Feature Add something new label Jan 13, 2021
@MaxDiz
Copy link

MaxDiz commented Jan 13, 2021

related: medic/cht-core#6751

@kennsippell
Copy link
Member Author

The branch 373-move-merge-contacts was useful for servicing the above linked issue. Useful if necessary, but not a stable branch.

@kennsippell
Copy link
Member Author

Just noting that when two duplicate places are merged, this scripts should also somehow manage any potential duplicate users which are linked to the places.

@kennsippell
Copy link
Member Author

kennsippell commented Nov 12, 2024

I've started work to create a new cht merge-contacts action and have a few design questions before getting too deep.

Changes to User Docs

The move-contacts command writes doc changes into the json_docs folder and then relies on the user to run upload-docs to write those docs. A merge-contacts command like merge doc a into b results in a's deletion. If a is a place, what should happen to users which are associated with that place?

Options I see:

  1. Do nothing - user can login but will be in a broken state and require manual intervention to fix
  2. Deactivate the user. User will be logged out. Username and password are permanently lost.
  3. We can keep the user active, and update their facility_id to b.

Regardless of what we choose, any action requires a write to the _users database. Since the upload-docs action only writes to the medic database and since we want to use an API when touching user accounts (required after 4.x) -- this means the upload-docs pattern is a bit muddled.

Options I see:

  1. upload-docs can be updated in some manner to gracefully accomodate document deletion
  2. merge-contacts can not rely on upload-docs and make direct calls to API

My proposal is 2 & 1 from above: Any time a document has _delete: true and is uploaded via upload-docs, we search for users associated with that place or contact and deactivate them.

Primary Contacts

When two places are merged, what should happen with the duplicate primary contact? Imagine "Pete's Area" with primary contact Pete being merged with "Pete II's Area" with primary contact Pete II. After the merge, there will be two contacts Pete and Pete II. In eCHIS, having duplicate contacts under the area caused problems (in dashboards and in workflows like MoH 515 and a lot of other places).

I'm thinking to make deleting the primary contact an opt-in feature via command-line flag --delete-duplicate-primary-contact

Feedback, alternatives, thoughts?

@garethbowen
Copy link
Contributor

A 4th option would be for the command to fail if there are users with this facility id. This could be combined with a user prompt to decide what to do about it.

I think we can rule out option 1 - silently breaking users is not cool.

From a user perspective, option 3 would be what I would expect to happen when I request to merge one place with another. However that will immediately expose more PHI to users so there's a risk there.

Why do you prefer option 2? Is the user likely to be obsolete in this case so that should be the default option, requiring manual intervention to recover them?

From an implementation perspective I'm leaning more towards doing more in API because that'll make the logic available and consistent across uses, for example, if someone wants to write a custom script, or if a tool such as the User Management Tool wants to incorporate this feature.

@kennsippell
Copy link
Member Author

kennsippell commented Nov 14, 2024

I'm basically writing this feature for the User Management Tool with the idea that a new UI would allow admins to merge duplicate places. There is currently an eCHIS-wide effort to cleanup dead accounts and unused parts of the hierarchy. Nairobi county alone had 350+ unused accounts many of which are duplicates. So this would get used.

From the perspective of UMT, I believe the ideal is to have an interface very similar to move-contacts so the same technology stack used for moving contacts can be re-used to handle duplicates -- just different data being passed to cht-conf.

I prefer option 2 because I think you should only be able to merge contacts of the same type. If there is a user at the place being merged, then there is likely a user for all places of that type (eg. CHP area). Similarly, if there is no user at the place being merged, then there is likely no user for any places of that type (eg. household). This may not be universally the case, but I think it's quite a good starting point -- especially for the projects using UMT which creates exactly one users for all places created.

Prompting the user to decide doesn't align well with a solution for UMT.

I think option 3 can result in multiple user accounts at the same facility. I think there are a few positive outcomes from this (mostly user never loses their session), but definitely some negative outcomes which I think are worse (increased possibility that two different users now create data under the same place which becomes a nightmare to untangle). Option 3 also seems inelegant to implement while keeping the upload-docs pattern, while option 2 fits elegantly and that's opportunistically convenient.

Perhaps you can elaborate on how putting more stuff into API would be beneficial to users or the User Management Tool? What specifically do you have in mind? If you want to do more via API for this specific user-management issue, I'd direct you to medic/cht-core#9139.

@garethbowen
Copy link
Contributor

Perhaps you can elaborate on how putting more stuff into API would be beneficial to users or the User Management Tool?

In general I prefer complex logic to be in API rather than cht-conf for a few reasons. Firstly and mainly because API is versioned with cht-core it means future changes to user structures made in cht-core don't break compatibility with cht-conf, user management, or other cases. It's also more reliable because it's hosted very close to couchdb so less likely to be impacted by network issues (UMT is close too, but cht-conf is often run far from the couchdb). Finally it makes the feature available for more use cases for the benefit of the wider community.

To answer your question directly, the users of the UMT would benefit because the action would be stable across cht-core versions and more reliable due to fewer http requests.

The obvious downside is having to upgrade cht-core to get the v1 API before you can call it, which is significant, but because it's a one-off cost I expect it to pay off in the long run to have the code versioned in API with a guarantee of backwards compatibility.

Prompting the user to decide doesn't align well with a solution for UMT.

I think it could if the user were prompted before submission, and that selection could be submitted with the API request, eg: "what do you want to do with user X?", right?

I prefer option 2 because...

I'm putting myself in the position of the user, and trying to decide what behaviour they would expect to happen if you explained the situation to them. I think a significant number would expect 2 to happen, but a significant number would expect 3 to happen. Furthermore both options have a serious impact on the users affected and are difficult to resolve remotely. Therefore I think the user needs to be involved in the decision.

@kennsippell
Copy link
Member Author

kennsippell commented Nov 14, 2024

Cool. An implementation in API feels much more costly and I don't personally feel I have time to take that on. (I imagine since these can take hours to run, and days to run safely, it probably needs to run in some sort of worker thread via a queue... and the queue probably should be queriable, resumable, cancelable, etc?). You'd probably want to blaze trail for the infrastructure by having move-contacts in API first... That scenario is much more common than this and is causing tons of data loss today.

Do you have a plan (and resources?) to build this in cht-core such that I should hold-off on this cht-conf ticket? An achievable middle ground might be to expose the data needed by the move-contacts script via an API to make these scripts more robust (reports created by a contact, descendants of a place, ancestors of a place, etc). These endpoints would move toward an ecosystem and community empowered by an API. I can't find anything in the cht-core repo tracking that work or the bigger work of moving the whole action into core. If you do have plans, can you share them since that would mean we are duplicating effort with the current UMT investments and roadmap.

If moving this into API is not a reason to hold-off, is it reasonable to move forward with a merge-contacts command in cht-conf now? I'm not far from having a working draft. I can also keep it as a "feature branch" or a stand-alone script in UMT if you prefer that, but I'm not sure that's a wise course.

For a merge-contacts action in cht-conf, I'd do the work in two PRs - one which implements merge-contacts with dependency on upload-docs and a second which changes upload-docs to handle deleting users elegantly. It will prompt users for their desired action, but when used with --force (like UMT does) the default will be to disable the user.

Sound reasonable?

PS I'm planning to follow-up with one more cht-conf action for UMT which is to delete an entire tree of the hierarchy (all contacts, users and reports below a place).

@kennsippell kennsippell self-assigned this Nov 14, 2024
@garethbowen
Copy link
Contributor

There are no resources planned to implement this from the Product team, I was just giving my 2c as to the implementation that I think would pay off in the long term. I'll leave it up to you where/how you want to implement this.

kennsippell added a commit that referenced this issue Dec 11, 2024
This adds new action cht merge-contacts. The change proposes to:

Move the code for move-contacts into a library lib/hierarchy-operations with interfaces move and merge
Parameterize the move-contacts code changing logic for: input validation, deletes the top-level contact, changes how lineages are updated, and adds report reassignment. All else remains unaltered.
Handles reports and contacts only. Unclear what other doc types I should be worried about.
@kennsippell
Copy link
Member Author

Released in 4.2.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Feature Add something new
Projects
None yet
Development

No branches or pull requests

3 participants