Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TG4 - Vocabularies needed for the Tests and Assertions #172

Open
pzermoglio opened this issue Sep 26, 2018 · 17 comments
Open

TG4 - Vocabularies needed for the Tests and Assertions #172

pzermoglio opened this issue Sep 26, 2018 · 17 comments
Labels

Comments

@pzermoglio
Copy link
Member

Following are the comments regarding building a vocabulary needed for the Tests and Assertions that have been provided to the group.

Arthur Chapman (@ArthurChapman):
I would like to see us develop the simple SKOS-based vocabulary on one of the terms/vocabularies needed for the Tests coming out of Task Group 2 on Tests and Assertions. I think (from memory) there are about 23 tests that rely on a vocabulary. Not all will be simple ones, but if we can pick one, then we solve several problems at the same time.

@pzermoglio
Copy link
Member Author

@ArthurChapman, @Jegelewicz, @tucotuco

One of the vocabularies needed for the Tests and Assertions is dwc:continent.
There have been active discussions around continents and water bodies, and the inconsistencies found around the two, see thread tdwg/dwc-qa#128.

Would any of these be a good candidate for building a vocab?

It would be very interesting to see what the marine folks think about this? (Gwen @gwemon, Mary-on email)

@Jegelewicz
Copy link

I think this could be a really good place to start. Without a clear definition of what should go into this field, we are all creating our own grand partitioning of the Earth. The either/or of Getty/ISO 3166 isn't helping. As long as we recommend both, we will cause problems.

Continent: The name of the continent in which the Location occurs. Recommended best practice is to use a controlled vocabulary such as the Getty Thesaurus of Geographic Names or the ISO 3166 Continent code.

Going back to the example in Darwin Core Continent and Water Body

Getty places this locality in:

World (facet) South America (continent) South Georgia and South Sandwich Islands

ISO 3166 places this locality in:

AN GS SGS 239 South Georgia and the South Sandwich Islands (dependent state)

AN = Antarctica
GS = South Georgia and the South Sandwich Islands
SGS = South Georgia and the South Sandwich Islands
239 = South Georgia and the South Sandwich Islands

Either is potentially workable, but we need to pick one so that we don't end up with conflicting information. This doesn't mean everyone HAS to use the chosen source, just that everyone understands which vocabulary the aggregators will be using. Personally, I would choose the ISO because it is driven by an international standards group.

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Sep 26, 2018

A problem I see is that it is difficult to look at continents in isolation from countries and even lower levels. A lot of the material in our collections is historic, and the ISO3166 does not (as far as I know) include historic country names. the Getty TGN on the other hand does include historic country names.

As far as using continent as a SKOS-based vocabulary as an exemplar for TG4 - it depends on how we look at it. Are we likely to just recommend an external standard (ISO3166 or Getty TGN) then we are not recommending an exemplar for our methodologies. The alternative is that we create our own - and with something like continent, I don;t think that makes any sense. There is definite value being able to reference an external source rather than attempt to develop another system for our own use. We may be better off working with Getty (or ISO) to cater for our separate needs (at the country level). Water Bodies is another - more difficult issue and it is important that OBIS and Arctos have input into what are adopted.

I am not saying continent is not an issue for us, and needs discussion on how we deal with it, however I don't think it can it be done in isolation from countries. But as an exemplar for TG2 - I think there may be better options.

@Jegelewicz
Copy link

But as an exemplar for TG2 - I think there may be better options.

Agree - we shouldn't have to create a vocabulary from scratch for continent, so for the purposes of TG2, see #171

@ArthurChapman
Copy link
Collaborator

ArthurChapman commented Sep 26, 2018

I have been giving the comment by @Jegelewicz some more thought. Higher Geography is very similar to higher Taxonomy. We don't tell people what Higher Taxonomic Classification to use but we can create a Vocabulary that includes acceptable values at various levels in the hierarchy. Similarly with Higher Geography - we should not dictate that you follow the hierarchy of GETTY TGN or ISO3166. We should, however, have a vocabulary that includes the terms available at say continent level. Thus in the example above - both "South America" and "Antarctica" are valid names for continents in both thesauri. So if someone wants to follow TGN and place South Georgia in continent "South America" and someone else wants to follow ISO3166 and place it in "Antarctica" they have a right to do so (a long as they document it). But from a vocabulary point of view as "continent" both are valid and acceptable values.

As far as Darwin Core goes, though, - they could make a recommendation that TGN be followed or that ISO3666 be followed for dwc:higherGeography.

In the Tests and Assertions - see #139 and #129 we have handled this by making the tests Parameterized, so that when you run the test you will be asked to add a Parameter - for example TGN or ISO3166, etc. and that will then report on records that are not Compliant with that test as Paramaterized.

@ArthurChapman
Copy link
Collaborator

As an exemplar taxonRank may be a good one (see #170 . We have an excellent starting point with the GBIF Vocabulary (http://rs.gbif.org/vocabulary/gbif/rank.xml) for Taxon Rank. This would also fit well with the Tests #162 and #163. We also have the advantage with Taxon Rank that to some extent, Ranks follow the various codes (but only to some extent). Further comments under #170

@tucotuco
Copy link
Member

tucotuco commented Sep 26, 2018 via email

@ArthurChapman
Copy link
Collaborator

Sorry John, my error - will edit to fix comments

@tucotuco
Copy link
Member

@ArthurChapman and @baskaufs, there are arguments for (in this issue) and against (#168) using dwc:taxonRank as an exemplar. Can those differences be resolved?

If not we are looking at very few candidate terms for exemplar vocabularies if trying to satisfy the condition that it also serve for a TG2 test and assertion. The complete list of terms having tests or assertions in TG2's core list is currently:

dc:type
dcterms:license
dwc:basisOfRecord
dwc:occurrenceStatus (work in progress already on controlled vocabulary)
dwc:country
dwc:countryCode
dwc:geodeticDatum
dwc:taxonRank

@baskaufs
Copy link

I'm not opposed in principle to using dwc:taxonRank as an exemplar. I was just afraid that if this task group put work into developing the controlled vocab and then the task group working on TCS 2.0 somehow changed the term, the work would be for naught. However, I suspect that this group will be working at a faster rate than that group, so presumably we would be done with the exemplar vocabulary before TCS 2.0 was finished anyway.

If dwc:occurrenceStatus is already a work in progress, it might be a good option.

@Jegelewicz
Copy link

The reference to ISO Continent codes is from "https://terms.tdwg.org/wiki/dwc:continent", an independent and no longer maintained commentary on Darwin Core, and is in error.

Is there a current and agreed-upon Darwin Core field definition list?

@tucotuco
Copy link
Member

Is there a current and agreed-upon Darwin Core field definition list?

Yes, the definitions found on the Quick Reference Guide are produced directly from the canonical Darwin Core Standard, which is now managed in a single CSV document at https://github.com/tdwg/dwc/blob/master/vocabulary/term_versions.csv.

@Jegelewicz
Copy link

Sigh.

This is not a very user-friendly format, nor is it very accessible. I will speak for all of the overworked collection managers who don't have time to sift through a bunch of text to figure out what they need to know. I realize that you all have skills and knowledge that I don't have, but I am supposed to be the one using this to make my data better. Can we make this available to Joe Collection Manager in a way he/she can understand and use it?

Right now, if I am interested and I google "Darwin Core Terms", the first result is the "independent and no longer maintained commentary on Darwin Core" that is "in error" with no real way to know that is what it is (It looks very official with the TDWG logo and says nothing about being out of date). I would venture to guess that others are relying upon it as well.

@tucotuco
Copy link
Member

I totally understand, and the fix is in progress. We are trying to release the new version of the Darwin Core web site, which has been two years in the making (volunteer time). The Quick Reference Guide (which will be easier to navigate in the new version) is where we expect Joe Collection Manager to go for the definnitions, and from there linking out to commentaries such as on the Darwin Core Questions and Answers site (https://github.com/tdwg/dwc-qa/wiki) and to recommended vocabularies when those are figured out. We will no longer link to the media wiki from the Darwin Core definitions. It is definitely a problem if it is not consistent with the standard, and maintaining it is clearly an issue.

@Jegelewicz
Copy link

Thanks! I know the volunteer time thing well....

@Jegelewicz
Copy link

Just a note, but in doing other research I downloaded a template for getting data into the Atlas of Living Australia and every field includes a link to a definition at http://rs.tdwg.org/dwc/terms/index.htm

@tucotuco
Copy link
Member

Excellent. That is exactly the right place to point to.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants