-
-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Coordinates, verbatim coordinates, and data entry #752
Comments
What is missing from the current model is: |
My take (and see http://arctosdb.org/documentation/places/specimen-event/) is that it's the the agent listed in specimen_event's job to determine if "these coordinates" and "river valley ca. 35km sw Mordor" have anything useful to do with each other in the context of a specimen, and simply not use them if they don't. From a user's perspective, I don't see much difference between "came up with coordinates" and "hooked specimen to coordinates of indeterminate origin." I'm probably missing some curatorial use... The old model structure had place name as primary data, with coordinates determined (by a person, etc.) from it. place_name which is incompatible with.... coordinates (downloaded from my WAAS-enabled FAA-certified GPS) Coordinates and placenames are complementary in the current model - they're in the same table and functionally equivalent, part of the same THING. GEOREFERENCE_PROTOCOL should provide coordinates-from-description vs. description-from-coordinates directionality. "GPS download" and "GPS transcription" (the best and debatably-second-best source of coordinates) seem to be buried in GEOREFERENCE_SOURCE (with 7K other values) - and some of mine (which were downloaded) are entered as just "GPS" and so aren't distinguishable from the (normal) "transcribed from the transcription in the field notes" data (which have a large error rate). I have no idea what we're TRYING to do with these fields, but I don't think we're doing it. GEOREFERENCE_SOURCE is obviously acting as a huge denormalizer (what I've been trying to avoid with the addition of a who/when). The data are mostly variations of a very few things (collector did it, found it on a map, MaNIS, GeoLocate). I completely fail to understand how "2007, Google Earth Maps, Europa technologies, Eye alt=11528 ft" is going to allow me to end up with the same coordinates (and that WAS the point; this field was invented for MaNIS), and if it can't do that I'm not sure what it is useful for. select count(*) from locality; COUNT(*)
UAM@ARCTOS> select count(*) from locality where dec_lat is not null; COUNT(*)
UAM@ARCTOS> select count(distinct(dec_lat || dec_long)) from locality; COUNT(DISTINCT(DEC_LAT||DEC_LONG))
Our "compromise" model is the "let's denormalize JUST until nobody's happy" model. Let's fix it. I see two ways out:
We need to consider usability in terms of the bulkloader and data entry screens before we do anything. Can we drop specimen_event_{agent/date} if we have looks-the-same-from-here data in locality, or are those data different (include cultural collns if we have this conversation)? Can we streamline anything else? Does this solve the "verbatim coordinates" issue (by making everything verbatim)? What do users have difficulty with now? (I think just the # of "locality fields" is a major issue.) Etc. Let's don't make this MORE unusable. If we're going to fix localities, we should also discuss the relationship between placename/coordinates and higher geography. Collected from one "locality," (esp. eg, coastal AK) a fish (seal/etc.) is likely to end up in some sea, a moose in some GMU, a lemming in a quad, a plant in some state park, etc., etc., etc., and that's actively preventing discovery by anyone trying to use higher geography. See also #739. |
Sorry I don't have time to fully digest all you wrote but I don't think you've addressed the issue. Take any record like this: http://arctos.database.museum/guid/UAM:Ento:118788 |
My argument remains this: The agent listed in specimen_event.assigned_by is responsible for everything in the locality stack. The model can be interpreted no other way. (But read the last paragraph before you sharpen your pitchfork!) Collector provided descriptive data? Collector should be specimen_event.assigned_by. Collector somehow came up with coordinates? Collector should be specimen_event.assigned_by. Student somehow came up with better coordinates? Student should be specimen_event.assigned_by. Curator tightened up the error? Curator should be specimen_event.assigned_by. #739 addresses "confirmed by...." (and I think it's a solid long-term solution, whatever we do elsewhere). Yes, lacking something in verificationstatus "assigned by" is ambiguous. You tossed a dart at your map for all I know - and the same is true for most things - which is why #739 proposes....
If you have access to someone else's collection, they trust you to edit their locality. If you don't have access to their collection, you'll have to split the locality and edit that. (#740 may change that.)
A specimen can have any number of specimen-events, so just leave the old and add a new if you wish to maintain that history. The model is pretty rigorous, things like normalization aside - it's hard to find a situation that doesn't work (if you buy into my definitions). BUT, I'm increasingly unsure that it's realistic for anyone to use the thing in a way that actually makes all that work. Doing so would require a lot of specimens having a lot of localities (eg, 4 in the above example), things that can be done with a click or two (update coordinates of unverified localities) should be done with lotsa-clicks (add/edit a new locality), etc. I don't think I can write interfaces to simplify that without introducing some sort of unexpected complications elsewhere. Given those two things, I suggest we back up and re-analyze what sort of locality (in the broadest sense) data we want and what we expect to do with it, then design a model which does that. If that's not possible (and it probably isn't, short-term), I propose we drop some expectations (eg, localities being somewhat-unique) and cram whatever we need to answer whatever questions ya'll want answered into the current model. |
I like this "Collector provided descriptive data? Collector should be specimen_event.assigned_by. Collector somehow came up with coordinates? Collector should be specimen_event.assigned_by. Student somehow came up with better coordinates? Student should be specimen_event.assigned_by. Curator tightened up the error? Curator should be specimen_event.assigned_by." and it's what we try to do mostly... there are problems with usability though. If the curator edits the locality record why not have Arctos auto-magically change 'specimen_event.assigned_by' to that agent's name & the new date? Doing this manually when editing lots of locality records just doesn't happen. |
that change sounds great to me. On Tue, Sep 15, 2015 at 3:30 PM, dustymc notifications@github.com wrote:
+++++++++++++++++++++++++++++++++++ phone: 907-474-6278 University of Alaska Museum - search 302,939 digitized arthropod records Interested in Alaskan Entomology? Join the Alaska Entomological |
This is a long thread (still digesting) but I just want to say that I think So in addition to push my name+date to specimen event option which On Tue, Sep 15, 2015 at 6:33 PM, DerekSikes notifications@github.com
|
tl;dr: so let's build a new model.
NOBODY (should) CARES! The shape/description has something useful to do with a specimen or not. I don't care if someone used a random coordinate generator (map+dart?) and got lucky, ALL that matters is your assertion that a specimen belongs there. If you insist on caring, then you can't also care (much) about "duplicates" (and near-duplicates), at least not in this model.
Someone asked for that - figure it out in the group and I can easily turn them back on (in the non-tabular forms - multiple anything-that-doesn't-concatenate remains a problem in tables).
But it was impossible in the old model! Unless you're talking about JUST coordinates (eg, 2 of the three dimensions of a shape), which is an extremely limited (and I believe severely abused in the old) use case.
Old model, you could add coordinates. New model, you can add coordinates - and also change the county while tracking the old. (And deal with depth/elevation.) The new model saves everything the old can, and a lot more, slightly differently, and associates it with specimens in more-functional way. "Legacy curatorial practices" were developed before GPS was a thing and involved pretending that parasites were parts of hosts, that hosts are just a string in a text field, that cultural collections are not largely made out of things with interesting DNA, and that we'll never encounter an individual twice or send bits to two collections.
If it's JUST specimen events you're talking about, I can do that. But you're probably not because nothing is important there - you want the old coordinates/continent/etc., right? See #579 - we can (probably) do that but it's far from trivial.
Let's start blank-slate; tell me what data you have, why you care about it, what you want to do with it, etc., and we'll find a model that does that. (I've got a short list of things that I care about too, but I think they're all pretty trivial/obvious.) Or if that seems overwhelming we can patch who/when in to the current model, but again that does come with a cost in what can be done elsewhere. (And I'm not sure it addresses your concerns??) Maybe the blank-slate approach leads here, maybe not, but it would be good to find out before we end up in some sort of panic situation (what lead to the current model). |
is implemented w/ https://github.com/ArctosDB/arctos/tree/v7.0.5, leaving this open. |
From UAM:Ento:
The data entry screen (and bulkloader) treat locality coordinates (locality.dec_lat, locality.dec_long) and verbatim coordinates (collecting_event.verbatim_coordinates) as the same thing to simplify usability. That's possible to change, but would require the addition of "duplicate" coordinate fields and several (around 30) extra columns to the bulkloader, which I suspect would come with significant usability issues. I am completely open to better ideas.
Arctos now provides for any number of "locality stacks" (everything between specimens and higher geography), so one way to deal with this is by entering two localities:
which is of course twice as much work with what seems to me an insignificant benefit - if you mis-typed (or downloaded from your GPS) the "verbatim" you probably did the same for the "spatial."
This was present in the old model and was by design excluded from the new. Under the current model (and any GIS system or map), [X, Y +/- Z, incl. datum etc.] is a defined geospatial area; it's a fact, a data object - I can find it on a map or go there or compare it to other areas. The assertion (via specimen_event.assigned_by_agent, specimen_event.assigned_on_date) is "this specimen<--->{specimen_event_type}<--->that place."
Pull 20,000 bugs out of a trap (at the same time, across centuries, whatever, the PLACE is all the same), enter them wrong, you can fix them all by updating one thing here.
Under the old model and what you're proposing, {[X, Y +/- Z, incl. datum etc.] + who/when} is an assertion, or at least a likely duplicate. Nail a GPS to the ground, we both read the same numbers off of it, we have a two "places." You read it, and then do so again a tenth of a second (Oracle's default date precision) later with the same place-results, we have two places. I guess I'm not opposed to that model, but I am very opposed to that model under our current data structure. If any two specimens are exceedingly unlikely to share a place, then why do we need place as a data object at all? If we have a time component to places, why do we have another time component one join away? Why not move it all closer to specimens (something like Attributes) and simplify the model?
Pull 20,000 bugs out of a trap, enter them wrong, you'll need to update 20,000 things here.
The introduction of "verbatim coordinates" to collecting event is an attempt to have both geospatial-capable locality objects and whatever someone scribbled on some label, verbatim. I think anything preventing those two actions is "only" an interface problem. Adding more metadata to the locality node is a modeling problem, and one which potentially completely changes the nature of the data.
The text was updated successfully, but these errors were encountered: