-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
stop low-information agents, do more with verbatim agents #4554
Comments
First pass: Attached are 1883 agents who have either one-word or initials preferred names, and who are not found outside of table collector. Proposal:
temp_agent_clean_first.csv.zip I'll proceed (using fresh data) |
Please retain 21263988 | Sanbornes |
If we proceed with this, that would be a matter of data. Maybe we'll be able to see through the clutter enough to build better rules at some point, but for now just about anything would escape the filters I'm working with. Address=South Pacific, alive=1972, WHATEVER. We'd like to have a bar, but at least initially it'll be a very low bar! Some remark suggests they should be involved in an accession - that would stop this, but hopefully only temporarily. Agent remarks suggest a name that might lead somewhere and the activity suggests one person, why not just use that and put the uncertainty in the remarks? Maybe we also need some sort of Best Practices document (or the existing cleaned up or added to) - "when given X, we suggest doing Y...." Unrelated to agents, some other remark makes me suspect this wasn't collected after 1973, and I'm absolutely positive it wasn't collected tomorrow - event dates could be tightened up a LOT (but not as much as they could have been yesterday...). |
We need to make a pass through this because this one 21313587 | á‘á’�á�ƒ | first name=á‘á’�á�ƒ|aka=Kigai; Remark: Ethnology and History verbatim agent; carver probably needs to be kept as is |
OBJECTION!
I have argued in the past for both of these types of single named agents to not be deleted or flagged as somehow "less valid" (i.e., moved to verbatim collector) than a record with more than one name. I will fight all night long to defend the single name Indigenous creator record. I will also defend the use of the name that is printed on the label as the preferred name, but will encourage our staff (including myself) to do a better job of finding the full corporate name, if it exists online). [I'll now get down from my soapbox...] |
@AJLinn brings up a few good points
|
The format of the agent name isn't in any way the problem, it's just a convenient place to start. This should eventually involves ALL agent names; they're still just strings, even (maybe especially!) if there are 17 "words" involved.
Please let me know if there's any way I can help - pull data out, put it in, WHATEVER. If this comes down to one-by-one it may never get finished. (But it got started and we're thinking about this stuff and that's something!)
Great, add that (or the publications or whatever) and the agent easily clears this bar.
Ditto. (And bigger picture, it seems we're going to be forced past our unique preferred name restriction at some point, which would be a lot more approachable if we could tell the Nike in Oregon from the Nike from Greece.)
See above, these are just a convenient place to start. I can drop this and grab a couple thousand random or something if the format is a distraction.
That is embedded in the "forced past unique restriction" mentioned above. Doing that and avoiding the absolute most disrespectful thing we could do - not properly attributing work to the creators - is the core of this; right now, if both Nikes show up and (reasonably) demand we use their name, we just can't. If we somehow allow two Nikes, we can't tell them apart (except maybe by digging through remarks, which isn't realistic) which leads to us attributing god-stuff to the shoe-folks. We need more data to move past our restrictions.
Please note that more names won't stop this (or that's how I hope it plays out, anyway). This is fundamentally a request for some sort of actual data beyond strings/names. The ideal form of that is something which leads to a lot more data - a ORCID/WikiData/LoC/whatever address - but the bar isn't that high (yet?? Probably never...) and a vague address ( I was going to refer to documentation - much of the requested information exists, but not in such a way that machines (or humans, unless they're willing to dig) can find it, but the current documentation is not clear. @Jegelewicz the remarks section of https://handbook.arctosdb.org/best_practices/Agents.html#general-recommendations-for-creating-meaningful-agents should look more like https://github.com/ArctosDB/documentation-wiki/blob/ee9493ba951cb64639eb0e97fb51b5e909871c01/_documentation/agent.markdown - "Use remarks as a last resort" is the critical (and now missing) idea. From the CSV:
I copied some of that to appropriate places: And now we have TWO non-name-based data points! There might be another 500 Spanjians out there, maybe even making Sportswear, and as long as they're not operating in San Marcos in 1971 they can't confuse anyone! Now I'm gonna go file an issue about the values I had to use... |
moved remarks stuff to Don't |
ᑭᒐᐃ is acting as a creator, I think it's safe to assume they were at the creation event which carries places and dates. I don't want to get into some tail wagging the dog situation so I'm (extremely) hesitant to just make those assertions, but I could round them up for human review (and help load anything which passes that). The other viewpoint is that ᑭᒐᐃ is functionally nothing but a string stored in a complicated way at the moment, changing that to a string stored in a less-complicated structure doesn't change any meaning or function that I can identify. At some point hopefully someone will "elevate" some/many/most "simple string agents" to agent objects (because they want to do something that requires the complexity, not "just because" - I hope), and I'm happy to build tools to facilitate, I just need a use case. (I don't think we're missing any functionality now, but I can probably save some clicking.) Note also that this approach would unavoidably allow what we're really trying to get rid of. If for some reason someone wants to scrounge up data for T. K. (who seems to be no more than a footnote in an obscure publication), then doing so would put them in the "safe pile" along with any other more-than-strings agent. I'm not sure if that's a feature or a bug, but it's probably unavoidable under this viewpoint. |
@dusty is it possible to get a csv or SQL for UCM records using values from temp_agent_clean_first.csv.zip? That way I can more easily take a pass at reviewing and adding more agent info when possible. |
I did this
but the result is a bit awkward to pass around so https://arctos.database.museum/archive/ucm_issue_4554 - let me know if you need something else. |
I actually really disagree with this idea, unless we instead add a free text field called biographical profile or biographical summary. This is essential, useful data that helps distinguish one John Smith from another, it shows up in our agent summary, and is critical for understanding the context of our collections. Compare our agent record for Robert Bloom to that of the UAF Archives (which is a short one also): It's easier and more useful than creating a PDF of a biographical profile and attaching it as a media file to the agent record... more clicks and downloads. We already allowed for markdown formatting for paragraphs of text, so the agent summary page looks better when there's more there.
I'm not sure this is an appropriate way to "claim" that agent. I'd prefer to add some born/alive/died/dead data, some geographic information in an address field, or additional biographical info if it's able to be located. Sometimes it's an oral history recording or maybe a historical photo in an online digital archive. Would that help fulfill some data points you're looking for @dustymc ? |
For anyone who reads it: sure. A date buried in there is also completely inaccessible to things like #4551 (and probably most users). The current documentation says "Don’t use remarks when more formal data are possible." which I believe is correct - we do have an appropriate "more formal" field for places (address) and dates (status) so that doesn't belong (or only belong, I don't care what's replicated in remarks to be more readable or etc.) in remarks. We don't have a place for biographical profile so that does belong in remarks. Unless....
New issue, no objection from me (as long as it can be defined in such a way that it's not "remarks when someone felt like using that field").
If they're working for you: Yes, absolutely. If they tossed a dead rat (or motorcycle or whatever) at you at some point: Nope, over-using relationships will just result in those data not getting cleaned up when we get access to tools (or brains).
Any of that will get the agent over the (tentative) current bar. I'd of course like to have all of it and in great detail, but at this point any sort of structured data feels like a great leap forward. |
The conversation seems to have drawn down, OK to proceed per #4554 (comment)? |
If by proceed you mean nuking all the one-name agents, I'm still working on my mega-list to add "alive" info and "shipping" address so there are three points of data. Can you give me time to fix them? I can prioritize for the next couple of days. |
No hurry, I just don't want to lose whatever momentum we've got going. Let me know if I can help with anything. |
Looks like I have 50 agents to update, which unfortunately I don't think there are any automated wizard things we can do other than looking at their agent activity report and assessing each one individually. We'll see how long it takes! |
See #4568 - we discussed rebuilding the activity page (somewhere...), let us know what would be useful to surface there. |
I'm not hunting, it popped in to my notifications today, and nothing is missing, it just in the wrong place. I do think this is something that docs/announcements will fix, at least for the vast majority of users.
Not really, that's where #4871 took us intentionally or otherwise, but I don't think anyone's been that brave yet and you don't have to be the first. "Alive when the paper was published" gets at a great deal of the problems and is a significant improvement over the vast majority of our agents, I don't have any huge problems with that. That said, if they're just some random author with some tenuous-at-best connection to your collection, why bother adding them? |
@cjconroy currently, this person SHOULD be fine as they are. Plans are to remove all agents in one of the collector roles that don't have anything in their agent profile but names and remarks to verbatim agent. Agents used in other capacities (transactions, publications, identifications, media creation) OR that also include an Arctos username will be left alone for now. BUT - it really helps the community if any information known at the time the agent is created is added! |
I don't understand all of this, but I've reviewed the list. These are agents associated with MSB Birds who should remain bonafide agents with some additional comments not in file: Joan Morrison |
@catherpes please add the information before the end of the year. If I can help in any way, let me know. |
@catherpes those should all have sufficient information now. Most of them had the information, but in remarks where it's not structured/accessible (and most of them had already been updated by the Agents Committee). |
From @Jegelewicz in some migration issue:
The goal is to not create unnecessary Agents, those which are capable of carrying the known information as verbatim agents. A vague association with the institution for which |
So what's the deadline for this cleanup? end of calendar year? Could I get a list of that for MSB:Herps ? I tried the SQL for UCM but no luck... thanks! |
I've been thinking a bit about this issue. One of the things that has the potential to be valuable about agents in the context of Arctos is the idea that having an agent assigned as collector to a given specimen is an assertion that the same person had that role as they did elsewhere in other collections. In the context of our bird data, for instance, we've been going through our agents and comparing them to the Arctos agent list, and identifying those where there is good evidence they are the same person--same time span, same areas of collection, institutions where we know they have worked or to which we know their material was distributed. I am concerned that that (not insignificant) investment of time not be tossed out because the agent we "synonymized" doesn't have anything other than a collector role. It makes sense for collectors to go into verbatim agent by default initially, but where somebody has taken the time to gather evidence in support of the assertion that our "so and so" is the same as their "so and so", isn't it worth preserving that information? Can somebody please clarify for me what it will take at a minimum for an agent to NOT get bumped into verbatim agent? I'd like to add whatever it takes to the core agents we've spent time on cleaning up to make sure they don't get bumped. |
This is just a request to record that information in a way it can be queried/is useful to the next person. Address, relationship, or status all carry more information than a string can and so will prevent deletion. (And see #5172 - I don't think we're doing that quite correctly, your input is most welcome.) |
Thanks for the clarification. In that case, I will see that we include some level of relationship with the Bell Museum (or another institution, if it involves transferred material) for those agents. Where easily available (e.g., obits exist or HR files allow), I will try to put in a status for born/died if possible. |
Do please keep in mind the ultimate goals of this, which is having sufficient information to do things like drop the (silly, but necessary for usability) unique index on preferred agent name. Who knows where that line really is, but a "Jones" that dropped a dead squirrel off and a Jones who collected as an employee (and so probably has notes and such) are likely on different sides of it. tl;dr: plz don't make relationships just to preserve otherwise low-information agents |
515378 attributes created. 38982 agents removed. Removed agents and agent names: |
I am a little frustrated that we now have nearly 1000 records with verbatim agents. I had fixed all of the agents with low information that were provided in the spreadsheet that was shared with us all months ago, but there were apparently over 500 more agents with low information that have now become verbatim agents. I don't understand why these were not flagged in order to give me a chance to update the records before this agent removal process. For many of these people, we do have information available to us that could have been added to flesh out the agent profiles. This makes for a lot of extra work to recreate these agents and link them back to their records. I understand that we haven't lost any information and that everything is still functional, but most of these people should be real agents in Arctos. |
Can I get Silas Fischer elevated to agent status? |
@catherpes instructions are here - https://handbook.arctosdb.org/how_to/How-to-Agentify-Verbatim-Agents.html Let us know if they aren't clear or are missing anything. |
The instructions do not get me to the page shown in the instructions when I choose manage collectors. Verbatim collectors are not shown as they are in the tutorial. Can someone who knows this please do this for me? I shouldn't have to undo what Arctos 'fixed' for me. Also, maybe this should be its own issue, but it would be helpful to be able to search on verbatim agents in the manage agents search so I don't have to search agents, then search in the specimen search window. |
thanks |
@catherpes I added Silas Fischer as a collector in all the MSB:Bird records where he was listed as verbatim. |
Thank you
…On 3/23/2023 2:51 PM, Teresa Mayfield-Meyer wrote:
* [EXTERNAL]*
**
@catherpes <https://github.com/catherpes> I added Silas Fischer as a
collector in all the MSB:Bird records where he was listed as verbatim.
—
Reply to this email directly, view it on GitHub
<#4554 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AJJOJ6EZGP7LKBGF6WQNGUDW5SZVDANCNFSM5TL4HONA>.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@catherpes if you get another one of these - pass it to me. I'll use it to create a video tutorial (banging myself on the head for not doing it this time...) |
😊 for the video not for the head-banging! |
Is your feature request related to a problem? Please describe.
We have a lot of low-data agents, they make everything in agent land more difficult than it needs to be.
Describe what you're trying to accomplish
Better data, less work.
Describe the solution you'd like
Describe alternatives you've considered
Much work, bad data.
Additional context
First Step: report of low information agents who don't have addresses or relationships and don't extend beyond table collector.
Priority
High, the problem gets worse with every new collection.
EDIT: the promised SQL
Just change the
CHAS:Mamm
ofwhere guid_prefix='CHAS:Mamm'
to an approriate value for other collections. Values can be found on https://arctos.database.museum/home.cfm.The text was updated successfully, but these errors were encountered: