You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue was reported to ALA staff by email on 21 September 2015 but I post it here for general comment.
While auditing ALA's Lepidoptera records I found thousands of duplicate record pairs from ANIC, Australian Museum, Queensland Museum and South Australian Museum. By 'duplicate' I mean that the same specimen lot in the same repository with the same catalog number is listed twice in ALA, not that the two records are absolutely identical. The duplicates are still online, and for an example see
Both records were provided to ALA by QM through OZCAM.
In September I sent to ALA as text files the 1266 duplicate pairs I found from ANIC, 3791 pairs from AM, 491 pairs from QM and 113 pairs from SAM.
Most of the duplication is apparently the result of ALA adding a second version of a record without checking for and deleting the previous version. There is no flag advising the user as to which of the records in a duplicate pair is the more correct or recent version. At the time of download (17 July 2015) there was a data quality field 'Inferred duplicate record'. Only a tiny fraction of the duplicated records were flagged 'true'.
The beetles dataset I downloaded on 2 December 2015 contains 9895 duplicate pairs from GBIF, Australian Museum, Queensland Museum and South Australian Museum. Example:
Record ID Catalog Number Matched Scientific Name Institution Code
06dba16a-f635-4cfd-950f-98381a486722 1005575 Clivina sellata EME
c0c38827-ec1d-45b2-8cd5-1b1b8d10f831 1005575 Clivina sellata EME
The full list of 19790 records is attached. dupes.txt
From @Mesibov on December 7, 2015 3:7
This issue was reported to ALA staff by email on 21 September 2015 but I post it here for general comment.
While auditing ALA's Lepidoptera records I found thousands of duplicate record pairs from ANIC, Australian Museum, Queensland Museum and South Australian Museum. By 'duplicate' I mean that the same specimen lot in the same repository with the same catalog number is listed twice in ALA, not that the two records are absolutely identical. The duplicates are still online, and for an example see
http://biocache.ala.org.au/occurrences/2e87583d-7240-4c67-adde-b23c8e509921
http://biocache.ala.org.au/occurrences/55c237a0-1e7b-44b2-9678-048b4b4d1c45
Both records were provided to ALA by QM through OZCAM.
In September I sent to ALA as text files the 1266 duplicate pairs I found from ANIC, 3791 pairs from AM, 491 pairs from QM and 113 pairs from SAM.
Most of the duplication is apparently the result of ALA adding a second version of a record without checking for and deleting the previous version. There is no flag advising the user as to which of the records in a duplicate pair is the more correct or recent version. At the time of download (17 July 2015) there was a data quality field 'Inferred duplicate record'. Only a tiny fraction of the duplicated records were flagged 'true'.
Copied from original issue: AtlasOfLivingAustralia/biocache-service#78
The text was updated successfully, but these errors were encountered: