Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

geography: start/stop date? #3018

Closed
dustymc opened this issue Aug 13, 2020 · 17 comments
Closed

geography: start/stop date? #3018

dustymc opened this issue Aug 13, 2020 · 17 comments
Labels
Function-Locality/Event/Georeferencing Help wanted I have a question on how to use Arctos

Comments

@dustymc
Copy link
Contributor

dustymc commented Aug 13, 2020

Issue Documentation is http://handbook.arctosdb.org/how_to/How-to-Use-Issues-in-Arctos.html

Is your feature request related to a problem? Please describe.

Geography isn't temporally stable.

Describe what you're trying to accomplish

Avoid mixing actual spatial/geography problems with "some entity moved some border" problems.

Describe the solution you'd like

  1. Add a temporal component to geography, or
  2. Stop treating geography as authoritative

Describe alternatives you've considered

Keep publishing cruddy data which isn't what users think it is.

Additional context

#1889 (comment) spawned a comment

Need to look at these but likely due to county boundary realignment with Cibola county 

https://en.wikipedia.org/wiki/Cibola_County,_New_Mexico

created on June 19, 1981

We have similar data in several contexts - Yugoslavia, Soviet Union, Kenya, etc.

Option One would allow us to say "Valencia County, before Cibola County was carved out." That model is capable of accurately modeling the data, but I don't think it's usable - it would essentially require data entry personnel to know which map the collector was looking at, and would likely require us to review thousands of records per year.

Option Two would perhaps only require shifting our viewpoint from seeing geography as authoritative to viewing it as curatorial assertions which might be useful in determining coordinates, which could be used to pull current and actual geography from some webservice.

Option Three: Get @tucotuco to fix this for us....

@dustymc dustymc added Function-Locality/Event/Georeferencing Help wanted I have a question on how to use Arctos labels Aug 13, 2020
@dustymc dustymc added this to the Needs Discussion milestone Aug 13, 2020
@tucotuco
Copy link

I'm very interested and researching solutions to this and a plethora of Locality Services that have a long history stemming from the first georeferencing forays in 1999. In the Darwin Core Hour and associated BBQs earlier this year there was a call for an "Event Gazetteer". The idea as presented was actually to make something like an Event Backbone to parallel the Taxonomic Backbone concept, rather than to just do a Location Backbone. But at the core, either way, could be a time-sensitive gazetteer to facilitate georeferencing, the results of which would be the most unambiguous way to find things spatially ("I mean here on the map" where here is a geometry). I would love to hear any further ideas, suggestions, use cases, but would prefer to get them here if they aren't already in there, for the whole world interested in the subject to see.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 13, 2020

time-sensitive gazetteer to facilitate georeferencing

For this, if there was a webservice that would minimally accept coordinates-plus-date and return "geography" (whatever that means...), we could determine if the place was called Whatever County when the Event occurred, whatever it's called now, and use that to flag records (or in this case to avoid flagging them - we've have plenty of actual problems to worry about!).

Accepting shapes and returning a list of intersecting shapes would be even better - it's currently difficult to avoid "3 pixels outside of Cibola County" when we're really looking for "says NM, maps to CN."

Using the service to pull current (or standardized - I guess I don't actually care how it's standardized) geography would allow us to provide a consistent search environment; everything from (X,Y) would come back in a search for {shapename}. That's currently a huge embarrassing gap in our capabilities.

Using the service to pull historical geography would allow us to provide a comprehensive search environment. ("Find stuff from everything that's ever been called Texas.")

String-->shape ("facilitate georeferencing") functionality would be pretty cool, but it's almost secondary from my POV - we have tools for that now. Curators and CMs might have a very different outlook on that, and being able to pull coordinates from multiple services and compare them would be pretty huge.

tl;dr: Super cool, how can the Arctos Community help make it a reality?

@tucotuco
Copy link

Using the service to pull current (or standardized - I guess I don't actually care how it's standardized) geography would allow us to provide a consistent search environment; everything from (X,Y) would come back in a search for {shapename}. That's currently a huge embarrassing gap in our capabilities.

Can you elaborate on this one?

@tucotuco
Copy link

I've added the use cases to the Technical:needs tab of the Imagining a Global Gazetteer Google sheet.

Super cool, how can the Arctos Community help make it a reality?

Give me things to test or that you would like statistics for.

Have a look at the principles of Higher Geography standardization for VertNet and opine as GitHub issues in that repository.

Suggest viable funding options.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 13, 2020

elaborate

Grab a seal, moose, mouse, plant, fish, and crab from the same place on some beach. The seal will get entered with State, Sea (they swim, the state issues permits), the moose will go to whatever the hunting regulations use, the mouse will get a quad (better to sort them into jars), the plant will get a Feature (NPS likes to pay botanists), the fish will get a drainage, the crab will get a marine designation.

It doesn't seem likely that we'll all agree on a single name for that point on the planet, and without that the descriptive data doesn't converge - no search finds everything. (The moose and mouse might converge at the state level, but they never share terminology with the crab.)

A service that accepts the point (or better yet, the shape) and returns something (or a set of somethings, or whatever) predictable would allow users to find them all with one search term, without trying to determine just exactly where "Alaska" stops and "Beaufort Sea" begins or "requiring" (we can't) Curators to use (or avoid) anything that we might consider geography, etc. Yay everybody.

Screen Shot 2020-08-13 at 12 29 42 PM

plus

Screen Shot 2020-08-13 at 12 30 07 PM

finds

Screen Shot 2020-08-13 at 12 30 58 PM

ferexample

@tucotuco
Copy link

A service that accepts the point (or better yet, the shape) and returns something (or a set of somethings, or whatever) predictable would allow users to find them all with one search term,

Do you mean you find it useful to be able get a textual search term as a proxy for the spatial term you used to find it? That seems odd, but hey. Would this be covered by something like an S2 cell Identifier? Otherwise, why not just use the spatial search directly?

@dustymc
Copy link
Contributor Author

dustymc commented Aug 13, 2020

textual search term as a proxy

Yup.

you used

Not so much. It's entered as "Some Country" because whatever reasons (and georeferenced), I think the collections (researchers, county invasive species dept., ...) want to find it by "Some County" or similar no matter what's been asserted/used.

just use the spatial search

As above I believe. We can ask, I'd certainly be fine without...

@tucotuco
Copy link

tucotuco commented Aug 13, 2020 via email

@dustymc
Copy link
Contributor Author

dustymc commented Aug 13, 2020

Yes.

A "you should prefer this" flag on exactly one of those whatevers would be icing, but I don't think it changes functionality. (It does for anything bigger than Arctos.)

I think the "just want stuff from there" and "this was in fact Bla County, but now we call it...." services (which might be different views of the same thing) would fundamentally change the nature of the questions the data can answer.

I think we'd totally use the "georeferenced however...." (and all the cool stuff we haven't thought of yet too!) but I also think that would be closer to "neato" than "transformational." I don't mean to be overly dismissive of that aspect, even though I probably was....

Also one of the curators involved in this pointed out that some of his stuff was getting flagged because our WKTs have fairly low resolution, which has been a problem before. MAYBE we'll be able to fix that ourselves when/if we get postgis running, but as of right now access to higher resolution shapes than we're capable of dealing with would be a significant reason for Arctos folks to find a way to get behind this effort.

@tucotuco
Copy link

A "you should prefer this" flag on exactly one of those whatevers would be icing, but I don't think it changes functionality. (It does for anything bigger than Arctos.)

I'm not sure about this. For example, should everyone prefer an English label? There are surely personal/situational/institutional/national preferences that differ.

I think the "just want stuff from there" and "this was in fact Bla County, but now we call it...." services (which might be different views of the same thing) would fundamentally change the nature of the questions the data can answer.

A lot like taxonomy, only different - a thesaurus linked to a spatial data store. Probably a LOT more tractable than taxonomy, for me at least.

Also one of the curators involved in this pointed out that some of his stuff was getting flagged because our WKTs have fairly low resolution, which has been a problem before. MAYBE we'll be able to fix that ourselves when/if we get postgis running, but as of right now access to higher resolution shapes than we're capable of dealing with would be a significant reason for Arctos folks to find a way to get behind this effort.

Can I get an example of something that was flagged? I'm interested in where, how and why.

Being able to refer to fully spatially-ennabled representations of places with metadata from a URL in a spreadsheet was one of the fundamental driving forces for locality services - to level the spatial data playing field in occurrence data management.

@dustymc
Copy link
Contributor Author

dustymc commented Aug 15, 2020

preferences

Yea maybe, but somehow avoiding Alameda/Alameda County/The County Of Alameda/Alameda Co./etc. seems useful, especially in eg DWC where there's only room for one THING.

taxonomy

We seem to have two underlying currents.

  • I think locality are facts. I can KNOW if I'm within Z units of (X,Y) or not, I don't care how it came to be or what reference or units someone used or etc., that's all irrelevant, the place is a real THING, very much unlike a taxon. The assertion or determination is all in whether the record is correctly attributed to that spatial fact or not.

  • I think some of us see "localities" more as attributes of records, with the how-and-why being about as important as the where - something less than entirely factual, perhaps more like taxonomy, certainly carrying more information than my idea of a strictly-spatial model. I instinctively dislike this less-normalized viewpoint, but the need for normalization is largely due to imprecise data so maybe the simpler approach will make more sense going forward.

Hosts and parasites are strong evidence that we do need some capability to share events, no matter how precisely we measure.

I have little idea about how to reconcile those viewpoints - maybe fleshing it out and providing data organization guidance would be a valuable part of a service.

example

Valencia County according to Arctos looks like this:

Screen Shot 2020-08-15 at 10 05 45 AM

The link opens records that say they're from Valencia County but don't map to Valencia County. (And from there Annotations are just a click, so I clicked.)

... and I can't find a good example in that so let's go to Idaho.

https://arctos.database.museum/guid/UAMObs:Ento:233962

has a red map border (there's spatial authority data, the record isn't in it), zoom in a bit and....

Screen Shot 2020-08-15 at 10 11 58 AM

The WKT only approximates the border, the map point is in fact where it claims to be, the spatial data we have is just wrong. I'm running everything through javascript, so more-precise spatial data (especially for something like Alaska with its eleventy-bajillion islands and long complex coastline) would probably just melt something. A service that could provide both lightweight spatial data for maps and a "in/out" determination with some precision behind it would be valuable. I can only deal with point-in-poly, so something capable of considering the error associated with the point would be even better.

@tucotuco
Copy link

I can only deal with point-in-poly, so something capable of considering the error associated with the point would be even better

I think you can do more than that. You can use a ST_DISTANCE function to find things that are as close or closer than the radius. Chalk one up to the point-radius method. ;-)

@dustymc
Copy link
Contributor Author

dustymc commented Jul 18, 2022

If we're going to do this, we should find a path and implement.

If we're not, we should delete non-current entries.

https://arctos.database.museum/place.cfm?action=detail&geog_auth_rec_id=38

This was referenced Jul 19, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Aug 23, 2022

I'm calling this a technical problem and going next task, but on the off chance that something radical happens with #4836 will try to delay until after AWG discussion.

@dustymc dustymc modified the milestones: Needs Discussion, Next Task Aug 23, 2022
@dustymc
Copy link
Contributor Author

dustymc commented Aug 25, 2022

PROPOSAL:

  1. add begin_date ISO8601 NULL, end_date ISO8601 NULL to table geog_auth_rec
  2. require both-or-neither
  3. the dates become part of the higher_geography concatenation

EXPLANATION:

ISO8601 datatype because these will generally be year-precision

The both-or-neither rule will help prevent duplicates and unnecessary assertions, and keep the concatenation consistent and predictable (eg, users will never see one year, always nothing or a span).

Including the dates in the concatenation will further help prevent duplicates (there's a unique index), and allow these to function, especially in string-based applications (like data entry) just like any other geography.

Examples:

Higher Geog Meaning
Some Place No dates - what we have now - would have an assumption of currency. For almost all situations, this means that procedures, training, forms, etc. remain as they are. This does disallow "began on DATE" data for current shapes, but I think that (mostly academic) loss is vastly outweighed by the consistency elsewhere.
Some Place (1840-1940) Same placename, different shape. Remarks might indicate that "1840" means "no idea actually but ...." The intent is not to have an accurate temporospatial history, but merely to allow a record from "Some Place" which correctly doesn't map to the current "Some Place" to exist.

BEST PRACTICES

Noncurrent geography should not be used in accepted Events. When such usage is unavoidable, the goal should be to "upgrade" to current geography when resources allow. (That is, nobody has time to preemptively go figure out what 'Yugoslavia' is supposed to mean, but if that is resolved - maybe when the records are used - then a new Event using current geography should be added, and the 'Yugoslavia' Event should become unaccepted.)

DOCUMENTATION NEEDED

For admin geography:

  • dates - you'll probably have to guess, you'll probably be wrong, document that in remarks.

For data entry:

  • Try to use current (no dates) geography, copy-paste the date-having higher geography string when you don't have a choice

HELP! Is this a good/workable solution? I need to implement (or find an alternative or whatever) relatively soon so I'm not wasting time scribbling in remarks.

@Jegelewicz
Copy link
Member

Seems OK - it would be nice if we had "begin" dates for all the current stuff - but there is zero chance of that? It seems better than what we are currently doing (nothing) to delineate between current and "old" geography.

@Jegelewicz
Copy link
Member

@mkoo @sharpphyl who else cares a lot about geography?

@dustymc dustymc modified the milestones: Next Task, In next release Aug 31, 2022
@dustymc dustymc closed this as completed Aug 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Function-Locality/Event/Georeferencing Help wanted I have a question on how to use Arctos
Projects
None yet
Development

No branches or pull requests

3 participants