Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Testing DQ checks: miscellany - Possible duplicate record #103

Open
Mesibov opened this issue Dec 13, 2015 · 0 comments
Open

Testing DQ checks: miscellany - Possible duplicate record #103

Mesibov opened this issue Dec 13, 2015 · 0 comments

Comments

@Mesibov
Copy link

Mesibov commented Dec 13, 2015

371332 false
85743 true

Tested first on the 9895 duplicate pairs (19790 records) with same taxon (= matched scientific name), same repository, same catalog number (#96):

5448 pairs with both records flagged 'false'
1571 pairs with both records flagged 'true'
2876 pairs with 1 record flagged 'false', the other 'true'

That's one kind of duplicate. From the full beetles data set I also generated record abstracts with 'Possible duplicate record', 'Matched Scientific Name', 'Locality', 'Latitude - processed', 'Longitude - processed', 'Collector', 'Event Date - parsed' and 'Basis Of Record - processed'. These are same taxon collected at same place on same date by same collector. Each abstract below is prefixed with the number of repeats.

In many cases a single abstract is flagged 'false' and its repeats are flagged 'true', which is what I would expect for a 'Possible duplicate record' flag:

1 false Onthophagus sydneyensis -32.58486 151.00613 Gollan, Mr John - Australian Museum - Science 2005-11-18 PreservedSpecimen
97 true Onthophagus sydneyensis -32.58486 151.00613 Gollan, Mr John - Australian Museum - Science 2005-11-18 PreservedSpecimen

But the 'true' and 'false' assignments are inconsistent, e.g.

10 false Zygocera lugubris Mount Gambier -37.81106163 140.7444735 D. Hangen 1988-12-12 PreservedSpecimen
2 true Zygocera lugubris Mount Gambier -37.81106163 140.7444735 D. Hangen 1988-12-12 PreservedSpecimen

8 false Anoplognathus narmarus Pitchi Ritchi Pass nr. Port Augusta -32.5 137.7667 Moulds,M.S. & Moulds,B.J. 1976-01-17 PreservedSpecimen
13 true Anoplognathus narmarus Pitchi Ritchi Pass nr. Port Augusta -32.5 137.7667 Moulds,M.S. & Moulds,B.J. 1976-01-17 PreservedSpecimen

9 false Eupoecila inscripta Exmouth -21.923847 114.11691 Bill & Mark Bell 2008-04-01 Image
8 true Eupoecila inscripta Exmouth -21.923847 114.11691 Bill & Mark Bell 2008-04-01 Image

5 false Coelophora inaequalis Hodgson Vale -27.6228 151.9373 Arthur Chapman 2010-05-07 Image
3 true Coelophora inaequalis Hodgson Vale -27.6228 151.9373 Arthur Chapman 2010-05-07 Image

2 false Tesserodon feehani 14km W by N Hope Vale Mission -15.26667 144.9833 Feehan,J.E. 1981-05-07 PreservedSpecimen
3 true Tesserodon feehani 14km W by N Hope Vale Mission -15.26667 144.9833 Feehan,J.E. 1981-05-07 PreservedSpecimen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant