-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Darwin Core Continent and Water Body #128
Comments
Thanks @Jegelewicz for this contribution. There are a lot of issues raised here. I think it might be worth separating them to simplify the discussion. As a preamble (something apparently obligatory when one is about to ramble), I would like to echo @ekrimmel from ArctosDB/arctos#1291 (comment), "...that obviously different collections/databases use different but equally correct ways to say the same thing..."
|
@tucotuco Thanks for taking the time to put together such a comprehensive ramble! It explains a lot and I hope we can use what you have said here to improve the geography reported from Arctos. |
Marvelous thread @Jegelewicz @tucotuco Note too that some including @chicoreus have wanted to discuss and encourage use of for some time. @tucotuco I think it's in our To-Do list of webinar topics too (at least broadly speaking). |
To add another example, the DMNS:Inv collection in GBIF is 29,608 catalog records today. But because we map our continents differently, if you select the seven GBIF continents for DMNS:Inv, you only get 20,206 records. The difference seems to be all the records with geography that begin with an Ocean rather than a continent. For example, our specimens from Hawaii, New Zealand, etc. that begin with the Pacific Ocean can't be found by searching the GBIF continents. If we were to separate continent (and add Oceania) and water body (as it appears some other collections including MCZ do) our records would map more accurately and would be found in searches based on the continents. Arctos continents: |
Despite https://www.youtube.com/watch?v=3uBcq1x7P34&t=25s, VertNet uses the following principle for its vocabularies with respect to continents, exactly matching GBIF. We use the geopolitical concept of continents following the seven continent model, which include Africa, Antarctica, Asia, Europe, North America (with Central America and the Caribbean), Oceania (with Australasia) and South America. |
@tucotuco So when an Arctos higher geography starts with an ocean or an unlisted land mass, what does VertNet (thus GBIF) do with it? We have 20 entries in our Continent/Ocean field and six of them match the VertNet continents. The seventh (Oceania) isn't in our Arctos vocabulary. BTW, loved the youtube video. |
@sharpphyl I might be about to over-answer your question, but I think it is important. In case you only need the bottom line, the answer is that VertNet keeps the geography just as it comes through Darwin Archive from the IPT and GBIF does not provide an interpreted continent because the original value "Atlantic Ocean" isn't a continent under their interpretation. I agree with their interpretation. Now, the rest of the story... When a data set from Arctos is published to the VertNet IPT, it does its best to manipulate the Arctos world view to the Darwin Core world view (VertNet made that interpretation in collaboration with @dustymc). That same Darwin Core world view is the one shared through the IPT to all aggregators, including VertNet (portal), iDigBio, and GBIF, who grab it from the IPT Darwin Core archive and put the data through their respective ingestion processes, which are not the same. GBIF ingests the data as published, but also adds interpretations (to an ever-increasing number of fields) to aid searches (https://www.gbif.org/article/5i3CQEZ6DuWiycgMaaakCo/gbif-infrastructure-data-processing, technical details at https://github.com/gbif/pipelines#interpretation). For Arctos, all of this means that, for example, this specimen https://arctos.database.museum/guid/DMNS:Bird:18967 with the following data in Arctos:
comes through the Darwin Core Archive (and thus the VertNet portal http://portal.vertnet.org/o/dmns/bird-specimens?id=http-arctos-database-museum-guid-dmns-bird-18967-seid-409311) as:
If these data had run through the VertNet migrator (or other tools based on the geography vocabulary mentioned above) they would have come out in Darwin Core as:
In GBIF the raw Darwin Core data are there, but there are also the following interpretations (https://www.gbif.org/occurrence/1145060690):
|
Thank you so much for your detailed response. In my world, there's no such thing as "over-answering" a technical question. I'm still working through the details of your response and the way our data is ingested into aggregators so I may have more questions, but I better understand now why some of our data is marked Invalid and why various searches may not return all our records. While GBIF lists the seven continents, it doesn't appear that GBIF lists water bodies and just references TGN. Correct? It does appear that GBIF makes one exception to substituting Invalid for any country that should be in the continent Oceania. They do map Australia (perhaps the continent, not the country) to Oceania. But New Zealand, Fiji etc. that we put in the Pacific Ocean are marked "Invalid" rather than remapping to Oceania. And Hawaii which we have in the Pacific Ocean is not remapped to North America. I think @Jegelewicz will add this to a AWG agenda for discussion and your data will be very helpful. Thanks. |
For reference, continent interpretation in GBIF is a matter of a simple lookup on the verbatim continent value provided by the data publisher using this table. Problems with this approach and a potential solution have been presented to GBIF in this issue. |
Over a year ago, I found that some of the UTEP specimens on islands in the Pacific, are flagged by iDigBio with "dwc_continent_replaced | Darwin Core Continent Corrected." Life kept moving and the issue fell by the wayside.
Then, while at SPNHC, Robert Mesibov offered to review some Arctos data for me. He downloaded the MSB fish data from iDigBio and reviewed the RAW file. One of the issues he found was that all of the stuff coming from oceans had no water body and instead the body of water was in the DWC_Continent field.
In iDigBio, Atlantic Ocean is a body of water, in Arctos, it is a continent/ocean.
I thought that it would make sense to call the tectonic plate the "continent", but that isn't how iDigBio does it. They use political boundaries for continent.
So DMNS:Bird:18967 in Arctos shows a dwc_continent of "Atlantic Ocean" in Arctos and no associated water body.
and DMNS:Bird:18967 in iDigBio shows a dwc_continent of "Europe" and has the flag DWC Continent Replaced.
Strictly speaking, we are both wrong but I doubt that anyone searching in iDigBio for Europe wants stuff from the South Georgia Islands. And when I search iDigBio for institution code "DMNS" plus water body "Atlantic Ocean" I get no results. At least anyone searching Arctos for stuff from the Continent/Ocean field for "Atlantic Ocean" will find this specimen (I tried it and it worked!).
All this being said. It seems to me that there needs to be a wider community discussion about Continent and Bodies of Water. I have suggested to Arctos that in the interest of making our data show up in appropriate searches in iDigBio (and GBIF I'm betting), we should add Water Body to our higher geography and for anything with a "continent/ocean" that is really a water body we add the correct name to the water body field. iDigBio will still replace our "Continent/Ocean" information, but the correct water body will get there, so people searching the oceans will find our data, however, people searching "Europe" will still get records from the South Georgia Islands.
The text was updated successfully, but these errors were encountered: