You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I did a series of assertions tests in December 2015 on a large sample dataset. Results and problems were reported in biocache-store #100, #101, #102, #103, #104, #105, #106, #107, #108. I would do a check like this very differently today and on a much larger dataset, but note that the problems in those 9 issues are all still open 5 years later (and like most of my ALA GitHub issues postings, have never been labelled, assigned or addressed).
Two different approaches to "assertions not working" are illustrated here in #393 (and on the Confluence page) and in my 2015 effort. One is to pick up "not workings" opportunistically. The other is to systematically analyse whether or not assertions work as intended, which sounds to me like quality control in ALA's data processing, and which does not seem to have been an ALA priority.
A related question - for which a GitHub issues page is not the appropriate place - is which (if any) of ALA's assertions have any value. Has ALA ever systematically examined whether and how data providers respond to assertions in their records? How confident can end-users be in the 2020 "data quality" filtering initiative, that the exclusions are validly "bad" and the inclusions validly "good"?
This is a placeholder to document data assertions that either don't work or do not correctly according to its intended purpose.
Full list of assertions, and decisions on what to do about them is kept in Confluence page of the same name.
The text was updated successfully, but these errors were encountered: