-
Notifications
You must be signed in to change notification settings - Fork 7
TG1 Framework Cheat Sheet
Paul J. Morris edited this page Sep 6, 2024
·
3 revisions
This is a Summary of pp.89-108 in: Veiga, A.K. 2016. A conceptual framework on biodiversity data quality. Tese (Doutorado) [Doctoral Thesis] Escola Politécnica da Universidade de São Paulo. Departamento de Engenharia de Computação e Sistemas Digitais.156p. With changes (as discussed in the TDWG data quality interest group: dcmitype:Dataset replaced with Multi-record. Improvement Method changed to Enhancement Method, Improvement Policy changed to Enhancement Policy,Data Quality Improvement changed to Data QualityAmendment.
For an updated version with concept names aligned to the bdqffdq vocabulary, see the bdqffdq summary in the draft BDQ Core standard.
- U = Use Case
- D = Dimension (e.g. precision)
- IE = Information Element (e.g. coordinates)
- M = Mechanism
- C = Criterion (e.g. “in controlled vocabulary”)
- E = Enhancement (description of a means by which data could be improved e.g. recommend replacement value from a controlled vocabulary).
- S = Specification (specification of how a criterion is to be evaluated e.g. “Iterate records and calculate the proportion of records with scientific name different from null”)
- US = Usages
- ID = Persistent GUID
- RT = Resource Type ~ dc:type { Single Record, Multi-Record}
- sr = instance of Single Records
- ds = instance of Dataset.
- V = Data Resource Value
- R = Assertion (result from a mechanism, of Validation, Measurement, Improvement on Resource)
- X: Domain
- x: instance
- { } set
- < > tuple
- ⋃ union
- ⋀ and (logical conjunction)
- ∈ is a member of
CD = { cd | cd =< ie, d, rt >, ie ∈ IE, d ∈ D ⋀ rt ∈ RT }
cd1 = < ie1, d1, rt1 >
- “coordinate precision of single records”
CC = { cc | cc = < ie, c, rt >, ie ∈ IE, c ∈ C ⋀ rt ∈ RT }
cc1 = < ie1, c1, rt1 >
- “The value of Basis of Records of single records must be in the controlled vocabulary”
CE = { ce | ce = < ie, e, rt >, ie ∈ IE, e ∈ E ⋀ rt ∈ RT }
ce1 = { < ie1, e1, rt1 >}
*“Recommend valid value for taxon name in single record”
UC(u) = { us | u ∈ U ⋀ us ⊂ US}
uc(u1) = {us1, us2}
- “A Use Case for Niche Modeling covers MAXENT and GARP modeling”
VIE(u) = {ie | ie ⊂ I E ⋀ u ∈ U }
- For a Use Case, what information elements are valuable.
AM(cd) = {cc | cd ∈ C D ⋀ cc ⊂ C C}
am(cd1) = {cc1, cc2}
- For the dimension in context coordinate completeness in a dataset, acceptable quality is met by all records having coordinates complete.
IT (ce) = {cd ⋃ cc | cd ∈ CD, cc ∈ CC ⋀ ce ∈ CE}
it(ce1) = {cd1, cc2}
- Recommending coordinates based on textual locality improves the coordinate completeness of single records and may result in compliance with the criterion data set must have all records with coordinates.
MP (u) = {cd | cd ⊂ CD ⋀ u ∈ U }
mp(u1) = {cd1, cd2, cd3, cd4}
mp(u1) = {< ie1, d1, rt2 >, < ie1, d1, rt1 >, < ie2, d1, rt1 >, < ie2, d2, rt2 >}
VP (u) = {cc | cc ⊂ CC ⋀ u ∈ U }
vp(u1) = {cc1, cc2}
vp(u1) = {< ie1, c1, rt1>, < ie2, c2, rt2> }
IP (u) = {ce | ce ⊂ CE ⋀ u ∈ U }
ip(u1) = {ce1, ce2}
DQP (u) = {dqp | dqp = mp(u) ⋃ vp(u) ⋃ ip(u), mp ∈ MP , vp ∈ VP , ip ∈ IP ⋀ u ∈ U }
dqp(u1) = {mp(u1), vp(u1), ip(u1)}
MM(cd) = {s | s ⊂ S ⋀ cd ∈ CD}
VM(cc) = {s | s ⊂ S ⋀ cc ∈ CC}
IM(ce) = {s | s ⊂ S ⋀ ce ∈ CE}
I (s) = {m | m ⊂ M ⋀ s ∈ S}
i(s1) = {m1, m2}
MC(m) = {s | s ⊂ S ⋀ m ∈ M }
mc(m1) = {s1, s2}
DR = { dr | dr = < id, rt, v >, id ∈ I D, rt ∈ RT , (rt = sr ⋁ rt = ds) ⋀ v ∈ V }
dr1 =< id1, rt1, v1 >
- “dr1 is a Data Resource which represents the Dataset "3cc6171e-8c52-4f65-ad7a-32c74e395f29" which contains 251,744 records” Data resources are defined as having persistent GUIDs
DQM(dr) = {dqm | dqm =< cd, s, m, r >, cd ∈ CD, s ∈ S, m ∈ M , r ∈ R ⋀ dr ∈ DR}
dqm(dr1) = {< cd1, s1, m1, r1 >}
- Coordinate numerical precision of the dataset 3cc6171e-8c52-4f65-ad7a-32c74e395f29 is 6.16 and this value was assigned by the software DwC-A Validator 2.0 which calculated the value by the average of significant digits of each record of the dataset.
DQV (dr) = {dqv | dqv = < cc, s, m, r >, cc ∈ CC, s ∈ S, m ∈ M , r ∈ R ⋀ dr ∈ DR}
dqv(dr1) = {< cc1s1, m1, r1 >}
- A DQ Validation asserts that the Contextualized Criterion “Geodetic Datum must be supplied” is COMPLIANT for a specific species occurrence and this validation was performed by the software Darwin Test by checking if the field Geodetic Datum of the record was not empty.
DQI(dr) = {dqi | dqi = < ce, s, m, r >, ce ∈ CE, s ∈ S, m ∈ M , r ∈ R ⋀ dr ∈ DR}
dqi(dr1) = {< ce1, s1, m1, r1 >}
- An amendment is proposed to replace the current value of the scientific name by the value “Apis” because Apis is the most similar valid name based on the Levenshtein distance in the Catalog of Life database using the software DwC-A Validator 2.0.
A(dr) = {dqm(dr) ⋃ dqv(dr) ⋃ dqi(dr) | dqm ∈ DQM, dqv ∈ DQV , dqi ∈ DQI ⋀ dr ∈ DR}
a(dr1) = {dqm1, dqm2, dqm3, dqv1, dqi1}
QC(dr) = {dqv(dr) ⋃ dqi(dr) | dqv ∈ DQV , dqi ∈ DQI ⋀ dr ∈ DR}
qc(dr1) = {dqv1, dqi1}
QA(dr) = {dqv(dr) | dqv ∈ DQV ⋀ dr ∈ DR}
qa(dr1) = {dqv1, dqv2}