-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Agents - disambiguation of duplicate agents and workflow of data migration - workflow needs help #6114
Comments
Good grief. I support doing whatever necessary to make this process functional, starting with changes to the way status column links are displayed. |
Three(??) items here, I think.
|
That may be true - but agents are required for Accessions which are required for catalog records. It seems like twice the work to process only Accession agents then do some more when you need agents for identifications and other determinations. People are doing everything in Arctos and we really only have a consistent path for "verbatimizing" collectors and preparators. |
My inclination is to allow duplicate agents and disambiguate them using a code but also tag them as 'imported from UAIT collection on 2023-04-11' so if there are 12 different James Vincents each has some info separating them & each collection can find their OWN James Vincent and if anyone cares to do the research and discovers that 3 of the 12 are the same they can be merged later, or not, but work proceeds. Just my 2 cents. |
I agree with @DerekSikes Initial collection creation and migration is not necessarily the best time to handle deep agent cleaning. (ok, please ignore how that sounds) |
Yes, plenty of details to work out. Yes, there are efficiencies in having all 96 forms of some name in one file and dealing with them one time. I suspect having the context of the data available is a (much) bigger efficiency, but ??? I'm pretty confident that 'H. P. H.' isn't useful/resolvable at this point, beyond that who knows..... One sorta-obvious improvement would be to have the checker do more with the relationships and such. I think those are completely ignored (other than needing to exist at some point), they should DO STUFF. (And maybe that's a good point to decide if this can be supported by the component loader environment or needs something more specialized.)
That's basically what verbatim agent does (and maybe that idea needs extended in some way).
If the agent data is garbage, then it's garbage for everyone and that has hard functional implications. Maybe there is some "second-class agent" structure-or-something, but if so it's something way beyond mixing low-quality junk in with the stuff that we've invested so much time in cleaning. So yes, Arctos should absolutely support 12 James Vincents - as long as they're all disambiguated by the data they carry and a user selecting one won't have any trouble figuring out which one is correct. |
I agree with @dustymc I don't want to go backward - I'd like to proceed with cleaning up Agent messes so that this ISN'T so difficult. We still have A TON of very low-quality agents and they are a large part of what makes this difficult. Perhaps we can at least start with a little tweaking of the responses from the tool or maybe we just need a coarse first pass because to start, I'd like to put the list of incoming names into three categories: Has an exact match in Arctos Then I can take those three bunches and review them appropriately Are the exact matches the incoming collection's agent?
Are the no matches worth creating an agent for?
Potential alternates
We need to ensure that using verbatim agent is possible for determiners of all kinds and have people feel comfortable using that (not sure we are there right now), we also need to to be able to use verbatim agents in transactions.... |
This sounds like a good approach. Can we get the tool to break down the feedback into these categories? |
Yup, and being stuck in the middle (why do we always end up here?!) is going to make creating clean difficult which makes cleanup more difficult which... positive feedback loops suck.
I suppose, but I don't think it can be meaningful (yet, I hope). You can get one exact match (because your "A." matched the existing "A." and if you use that you'll get eventually sucked into some horrid cleanup), and no name matches (because there's a typo in Arctos) and everything else you can imagine, and lots of things that nobody could see coming. Hopefully that'll all change once there's more cleanup, but as long as we've got (2) floating around, this is going to be weird. If we ever get cleaned up, then "ORCID matches nobody cares how you spell it" and "Not that John Doe because birth dates don't match" and such become possible, and maybe that environment would support some more automation. |
Returning to this and specifically @dustymc summary of factors at play here:
Creating agents first is a workflow that really is required by our model since everything else requires an agent. What about a temp/ pending / in progress sorta flag/ table/ queue for these newly minted agents which appears at first appearance as low-data agents because we havent assigned records, identifications, loans etc as well as no biographical info YET?
I'm not sure that works-- for a committee to work through potential merges? Seems like a job for a tool to suggest merges that can be reviewed. We need another separate tool, right? I actually think the safer route is to err on the side of creating dup agents from new incoming because we dont know if they are the same or not especially for common name combos. Then let some agent clean-up tool do its thing but we should understand that it's an iterative and progressive ongoing process as more data is added to Arctos. See above about a pending table/ flag/ whatever
you lost me on point 3. Not sure what this CSV looks like (will edit if I find that issue) but is it impending adding agents to use? Also will tag Erica Krimmel (maybe start project for Switzer?) |
I was going to tag Erica but it seems she is no longer part of the Arctos Github organization? |
This seems pretty trivial at this point.
Not that I can see, we just need to purge the low-information agents and stop making more of them. There's a functionally-identical and (much) easier to use path to that.
Agents should carry information that doesn't require a committee. If they don't have that then they don't need to be agents.
Not if we care about calling people what they'd prefer to be called or providing proper attribution for their work. We need to confidently support multiple unambiguous agents of the same name to do that. That is unavoidably incompatible with creating ambiguous agents. (Or we could take over the world and require unique names. Dibs on And to be clear, I'm not trying to say we should or shouldn't do anything in particular, I'm just trying to spell out what's necessary if we want to do the things I think I've heard from the collections and the larger community. We can do about anything else, but how we model the data will unavoidably control the possible functionality. |
When creating a new agent there is a remarks field that one can type remarks in. This field does not seem to display on any of the new agents pages. |
That may be the curatorial remarks field. There are both curatorial remarks and public remark fields. One allows data to be entered that the public cannot see, so we can keep information more private, or keep notes on Agents. Look for the little public versus private icon when adding an attribute. |
Hmmm no I think that is still there. Ah found it, click the link for See Full Agent Attributes. It should be there with all of the other nitty gritty for the agent in the Table form for the attributes. |
No.... Bug patched. |
If you did this during creation before the bug above was fixed: it wasn't added, please edit to add. If something else: I need details. |
How do I get to the page to add the remark? That field/page seems to only be accessible when creating a new agent. There is no editing access? |
https://arctos.database.museum/agent.cfm?agent_name=Johnson or from the picker |
Ah, the remarks IS an attribute! Ok, I think that problem is solved. Here's a new one. There's an error on the bottom of this page: https://arctos.database.museum/agent/21351197?deets=true And another - the agents picker doesn't list the 'alive' date attribute, wouldn't that be a super useful way to disambiguate? |
I'll get another patch out, probably tonight.
New Issue please (shouldn't be a problem, I just need the request). |
ok! |
An incoming collection provided me with a list of people/organization names in use in their current data. There were 921 agents. After some eyeballing and review, I combined some duplicates and narrowed down the list to 794. A lot of parsing and adding periods later, I had a file that included all of the preferred, first, middle, last and for some, akas, which I ran through the Agent prebulkloader. The results are there if you are brave enough to look.
I can have the incoming collection go through this list as is, or I can try to help a bit. This can be a completely overwhelming task; for a new collection because - well it is a lot, and for me because I do not know the collection, so it is difficult for me to make assumptions about whether their
Benjamin M. Fitzpatrick
is the same person as Ben M. Fitzpatrick or Ben Fitzpatrick (and never mind that those two Arctos agents might be the same person)?Multiply this decision tree by about 100 (534 of the names I ran through are NOT in Arctos and have no close matches - which means they will be verbatim agents unless the incoming collection can provide one piece of identifying information - which they will also have to go find and 110 have an exact match in Arctos - but are we SURE that they are the same person? How hard should we look into that?)
The results of this process when downloaded from the tool cannot be processed easily as there are line breaks in the status column - ideally, each piece of "advice" would be separated by something I could use to parse them in Excel - something that isn't used in any of the names or "advice" text. @dustymc can we do something different there? I find that reviewing them in the tool is potentially easier - but very time consuming and not easy to do in a modular way. The only way to find all of the
[fatal]|nocase preferred name match:
quickly is to download the results and use the FIND feature in Excel or whatever tool you choose to review csv. Otherwise, you see this buried somewhere in the status.Here is the status column for Chapman in the Prebulkloader
Note that it is only at the very bottom of this extensive list of possible matches that I find
[fatal]|nocase preferred name match: Chapman|{https://arctos.database.museum/agents.cfm?agent_id=605})
Placing the burden of cleaning up agents on incoming collections seems a bit unfair. As long as we continue to have agents like Chapman- this is going to be a difficult process. Any chance we can take the next step in removing low quality agents and verbatimize ALL of those that don't have any identifying information? Any other ideas for making this better/easier?
Help!
The text was updated successfully, but these errors were encountered: