Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Verbatim identifier? #5283

Closed
Jegelewicz opened this issue Nov 15, 2022 · 11 comments
Closed

Verbatim identifier? #5283

Jegelewicz opened this issue Nov 15, 2022 · 11 comments
Labels
Data Quality Function-Agents Function-Taxonomy/Identification Help wanted I have a question on how to use Arctos Priority-High (Needed for work) High because this is causing a delay in important collection work..

Comments

@Jegelewicz
Copy link
Member

As I work with incoming collections, using verbatim agent helps with the collector and preparator name issues, but what about identifiers? Lots of collections have names of people who made identifications that are something like S.A Northrop. How do we use that if we can no longer create name string agents? Identification remark seems less than helpful. Should we use verbatim agent with "ID determiner" in method? Should this be an attribute of identifications instead of a record attribute?

Note that we have accepted name string agents for these things in data already in Arctos - which seems like a bad idea if we aren't going to allow it in the future.

Advice accepted!

@Jegelewicz Jegelewicz added Priority-High (Needed for work) High because this is causing a delay in important collection work.. Function-Agents Function-Taxonomy/Identification Help wanted I have a question on how to use Arctos Data Quality labels Nov 15, 2022
@Jegelewicz Jegelewicz added this to the Needs Discussion milestone Nov 15, 2022
@dustymc
Copy link
Contributor

dustymc commented Nov 15, 2022

which seems like a bad idea if we aren't going to allow it in the future.

Yes. If we're going to do something amazing, we need to keep the goals of #4554 in mind. "S.A Northrop" as one of many similar or identical preferred names is fine as long as they all carry sufficient information to disambiguate themselves. Multiple 'S.A Northrop'-ish agents who get shoehorned in with no useful data because someone found a way to get around whatever rules we've established at the moment will just keep us where we are, mired down trying to understand data that probably can't be understood.

If knowing that S.A Northrop identified a record adds value, then you probably have plenty of information to create (and use) an Agent which will add value to everything it touches and will not conflict with other Agents. If you don't know that much then the string can't DO much and a verbatim agent is appropriate.

attribute of identifications

That's arguably "correct" but it could only be necessary if there are multiple identifications by low-informtion agents. That sort of data is certainly not common and I strongly suspect keeping all of the 'verbatims' in the same slot (from where they might eventually be consolidated into an Agent) vastly outweighs any possible benefit of trying to better attach them to "sub-records."

@ewommack
Copy link

keeping all of the 'verbatims' in the same slot (from where they might eventually be consolidated into an Agent)

This seems like something that would be really important for collections. When we're bulkloading with the new agent system, we may have lots of verbatim agents in different areas, but eventually we can hopefully change most to full Agents. The best way to set up all the verbatim agents to make this an easier process would be really valuable.

For vertebrate collections, I would anticipate having the same verbatim agent in my identification field and my collector field. Identifications for us are generally done by either an experienced museum staff/student or by an expert in a field. When we update a taxonomy or make an identification I include info on who made the identification so I can go back, and say "see this graduate student specializing in Long-tailed Grouse made the ssp ID call - don't just believe me, I'm throwing his expertise behind it".

@Jegelewicz
Copy link
Member Author

So the answer is use verbatim agent attribute with "determiner" in method? We need some consistency there if we ever hope to put them where they belong eventually.

@ewommack
Copy link

So the answer is use verbatim agent attribute with "determiner" in method?

So I would pick unknown for the Agent field, and then under determiner I would select "verbatim agent" and it pulls who ever I put down for the a verbatim agent in attributes? That might be nifty, but what if you have more then one verbatim agent?

@dustymc
Copy link
Contributor

dustymc commented Nov 17, 2022

pick unknown for the Agent field,

I'd just leave it blank (or file an Issue if I couldn't).

what if you have more then one verbatim agent?

You can have as many as you need, but maybe comment on #5193 if you intend to get too crazy.

@Jegelewicz
Copy link
Member Author

Once we have identifications attributes, should we add

verbatim determiner - Verbatim determiner accepts any string value. This attribute should be used when there is little to no information about a determiner instead of creating a low-information agent (no dates, relationships, or addresses are known for the agent).

? or add to the definition for verbatim agent and suggest putting "determiner" in method?

@Jegelewicz
Copy link
Member Author

@dustymc could you give us an idea how many low quality agents participate only as determiners for identifications, and collector roles?

@dustymc
Copy link
Contributor

dustymc commented Apr 12, 2023

4649

temp_ci_or_less.csv.zip

@lin-fred
Copy link
Contributor

attribute of identifications

That's arguably "correct" but it could only be necessary if there are multiple identifications by low-informtion agents. That sort of data is certainly not common and I strongly suspect keeping all of the 'verbatims' in the same slot (from where they might eventually be consolidated into an Agent) vastly outweighs any possible benefit of trying to better attach them to "sub-records."

We would like to move forward with this but the main decision point seems to be where a verbatim identifier needs to go.
I can see arguments for both so it might be AWG time?

@Jegelewicz
Copy link
Member Author

@dustymc how about a list of records with multiple identifications determined by low-quality agents?

@Jegelewicz
Copy link
Member Author

I suggest we close this - we don't need more attributes and people want to create agents for determiners.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Quality Function-Agents Function-Taxonomy/Identification Help wanted I have a question on how to use Arctos Priority-High (Needed for work) High because this is causing a delay in important collection work..
Projects
None yet
Development

No branches or pull requests

4 participants