Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent cleanup/tightening default dates #4926

Closed
dustymc opened this issue Aug 10, 2022 · 15 comments
Closed

Agent cleanup/tightening default dates #4926

dustymc opened this issue Aug 10, 2022 · 15 comments

Comments

@dustymc
Copy link
Contributor

dustymc commented Aug 10, 2022

Here is an example of some stuff that could be cleaned up

Collector = Joseph William Winthrop Spencer
Dates Looks like some stuff was entered with placeholders:
image

(Note - the identification dates and specimen event dates are also 1840-01-01!)

Affected records - https://arctos.database.museum/saved/1660169124103

To Do:

  1. Change all collection event begin dates to agent born date
  2. Change all collection event end dates to agent died date
  3. Append to verbatim date - "begin and end dates from collector birth and death dates"
  4. NULL all determination dates
  5. Change specimen event assigned by and date to agent entered by and date

@dustymc this seems like a lot - but it would be great if we could "make it so"!

Originally posted by @Jegelewicz in #4551 (comment)

@dustymc
Copy link
Contributor Author

dustymc commented Aug 10, 2022

Change all collection event begin dates

I think only for specific values - maybe some are correct, others are some default.

Would be very useful if I could have access (temporary is fine) to this collection - I'm not comfy doing the first of something this complex without access to the Arctos operator UI.

@dustymc dustymc added this to the Active Development milestone Aug 10, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Aug 10, 2022

@Jegelewicz
Copy link
Member

Access granted

@dustymc
Copy link
Contributor Author

dustymc commented Aug 11, 2022

@Nicole-Ridgwell-NMMNHS does @Jegelewicz 's recipe above work for #4924?

The datatypes are consistent, I think that works from this end, if it works for you.

I'm struggling with whether to try to come up with one reusable solution, or if this will need customized for every situation. Input greatly appreciated. (Keeping it simple/consistent will mean I'm a lot less likely to make messes so that's my vote, but I can accommodate WHATEVER.)

The report is https://arctos.database.museum/Reports/cat_record_reports.cfm?report_name=dates:%20collecting%20vs.%20collector, here's who's involved:

@ebraker
@Nicole-Ridgwell-NMMNHS
@mkoo
@AJLinn
@campmlc
@ccicero
@amgunderson
@DerekSikes
@atrox10
@mvzhuang
@cjconroy
@jtgiermakowski
@wellerjes
@gradyjt
@jrpletch
@AdrienneRaniszewski
@aklompma
@acdoll
@jldunnum
@jrdemboski
@genevieve-anderegg
@msbparasites
@mlbowser
@jessicatir
@ewommack
@kderieg322079
@sharpphyl
@StefanieBond
@catherpes,@catherpes
@sjshirar
@lin-fred
@claypollock
@Jegelewicz
@droberts49
@zmsch

@Jegelewicz
Copy link
Member

Not simple, but some way to review the changes and approve without the need for a bunch of unloading/loading?

@Nicole-Ridgwell-NMMNHS
Copy link

@Jegelewicz 's recipe above work for #4924?

Yes, except I don't think step 4 needs to happen for our data, and I would add an additional step that verbatim date needs to be changed to "no date provided".

@dustymc
Copy link
Contributor Author

dustymc commented Aug 11, 2022

review the changes and approve without the need for a bunch of unloading/loading?

I think that's sorta either-or. I can make relatively straightforward and relatively homogenous updates - I don't think looking at ID determiner (or whatever) on some and not others is much problem - or we can turn this into some sort of more complicated (but more capable) tool (if such doesn't already exist). The former involves maybe answering a few straightforward questions and saying "GO!," the latter would be some flavor of more complicated.

"review and approve" can probably happen to some extent in isolation - I can pull out identifications or something - but I don't think that can be very useful, and reviewing in context would involve something like massively more infrastructure.

@AJLinn
Copy link

AJLinn commented Aug 11, 2022

I know UAM:EH has a lot of cleanup to do with our collecting, creation, use dates. We also have the complication of sometimes having multiple collectors (most of whom won't have dob/dod dates in their agent profiles) when items are passed down through the generations (e.g. https://arctos.database.museum/guid/UAM:EH:UA2015-004-0001). If we know the multiple dates of use/collection by the various previous owners, we could create many use and/or collection dates to correspond with the collectors and the places they used them. This would create a great deal of complication to any automatic changes that a process might implement.

I like the idea of "review and approve" to call attention to the problem, like with agent mergers. As long as we have time to manage the review process when it drops as it might take a great deal of time and research to verify or correct.

In either case, I am in favor of something to improve these problematic data.

@ebraker
Copy link
Contributor

ebraker commented Aug 11, 2022

Since UCM records all need slightly different treatments I'm just going to fix these by hand. In general I tried to do this sort of date narrowing when migrating into Arctos, I just wasn't able to catch it all so I think it is a good project.

There are a couple considerations that might be true for other collections:

  • For many of these records we have existing collecting_event_remarks e.g., "Verbatim Date: no date recorded - before 10/15/1999", so if another comment is appended ("begin and end dates from collector birth and death dates") it seems confusing(?)
  • The E. R. Warren records are from the E. R. Warren Collection (e.g., purchased by Warren) so could in fact predate his DOB - I forget if we decided to use creator/benefactors as agents in a recent discussion, e.g. "E. R. Warren Collection"
  • Sorry but I have to say that some of these cascading assertions make me very nervous in terms of sorting out what is what. For instance, my agent "H. H. Smith" was automerged to Herbert Huntington Smith (which probably shouldn't be done for anything with a last name as generic as Smith). If the dates were narrowed based on Herbert's DOB/DOD, we are now out of range of the actual collect date for the record, which should take precedence. (Obviously there can be transcription errors that this operation/agent cleaning catches, but it can also introduce errors)

Can I also request that for ALL UCM collections, when 'identified by' agent = "unknown" we insert the 'collected by' agent name in catalog records? This will update probably 70% of our records. I have always wanted to correct this issue and this action will avoid a lot of the low quality/verbatim agent merging issues that I'm not all the way on board with...

@Nicole-Ridgwell-NMMNHS
Copy link

Is this supposed to be some sort of automated, across the board cleanup or will it be a tool we can utilize for individual collections/agents?

@Jegelewicz
Copy link
Member

a tool we can utilize for individual collections/agents?

That's what I'm thinking.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 11, 2022

@AJLinn I think that's all expected - the new report format lets me add the flag so we can find things, if some low-hanging fruit can be dealt with then let's do so, if other things take more time then that's OK too - and the flag will hopefully help users understand potential limitations while things are being sorted out.

@ebraker that's mostly expected too, for some situations I may be able to adjust the flag-finding code, for others maybe things are just complicated enough to have to stay flagged.

These aren't rules, they're just indications that maybe something needs more attention.

@Nicole-Ridgwell-NMMNHS this is by request, and I don't think that'll change. MAYBE it'll turn into some UI tool or something, for now I'm just trying to find some way to write mostly-reusable code that I can point wherever you tell me to.

@ebraker
Copy link
Contributor

ebraker commented Aug 11, 2022

Thanks @dustymc Should I put in a separate issue for UCM for this?:

Can I also request that for ALL UCM collections, when 'identified by' agent = "unknown" we insert the 'collected by' agent name in catalog records? This will update probably 70% of our records.

@kderieg322079
Copy link

I've just got one, so I'll fix it manually. Most likely just the wrong agent was selected because it is a vague/common name: https://arctos.database.museum/guid/UMNH:Bird:8472

@dustymc
Copy link
Contributor Author

dustymc commented Jan 23, 2024

I don't think this needs to survive #6813, Agents can be cleaned up as data objects, records will need the involvement of collections (and many agents were created from the very possibly erroneous data in those records, so proceed with caution!). Tentatively closing.

@dustymc dustymc closed this as completed Jan 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants