Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguating plant names and fixing typos #79

Open
petermr opened this issue Jul 12, 2019 · 14 comments
Open

Disambiguating plant names and fixing typos #79

petermr opened this issue Jul 12, 2019 · 14 comments
Assignees
Labels

Comments

@petermr
Copy link
Collaborator

petermr commented Jul 12, 2019

Plant names should be disambiguated at the binomial species level in the plant table. Thus

  • Lantana camara

  • L. camara

  • Lantata camara (a typo) should all be mapped to the same species.

  • Ocimum sanctum should be mapped onto its preferred synonym Ociumum tenuiflorum

EssoilDB is not a taxonomy site so there is no need to record synonyms for data entry. (It may be useful to search for synonyms but this will be through a different mechanism.)

@petermr
Copy link
Collaborator Author

petermr commented Jul 12, 2019

Treatment of VARIETIES and HYBRIDS

This is very important information but is relatively infrequent. We do not have a simple data model, so suggest:

  • the plant is represented by a simple binomial name in plant
  • the variety is recorded as a text string (not normalized) in a variety field in the profile data . The plant_id field should refer to the plant table.
  • hybrids are recorded in a hybrid field (not normalized text string) in the profile. There should be NO plant_id value. If particular hybrids are fund to be common and important we may normalize this.

@EmanuelFaria
Copy link
Collaborator

EmanuelFaria commented Jul 14, 2019

@petermr @Shruthi-M @gilienv
I just stumbled upon this database of plant taxonomy. Maybe it's useful to you?

https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1

Taxonomy tool: https://www.gbif.org/species/158596304

Seems like there's a link to download the entire data set here: (Not sure)
https://www.gbif.org/dataset/66dd0960-2d7d-46ee-a491-87b9adcfe7b1#dataDescription

Description

GRIN taxonomic data provide the structure and nomenclature for accessions of the National Plant Germplasm System (NPGS), part of the National Genetic Resources Program (NGRP) of the United States Department of Agriculture’s (USDA’s) Agricultural Research Service (ARS). In GRIN Taxonomy for Plants all families and genera of vascular plants and over 46,000 species from throughout the world are represented, especially economic plants and their relatives. Information on scientific and common names, classification, distribution, references, and economic impacts are provided.

@gilienv
Copy link
Owner

gilienv commented Jul 17, 2019

Thank you Manny, we use GBIF for most of our Ecology work and its one of the most dependable species databases. I have reminded Shruthi to take into account information for species from this to build her Plant Table.

Shruthi:

Please create the plant table with following columns:

Binomial Species Name
Synonyms
Habit (i.e Overall shape == Grass/Vine/Tree/Shrub/)
Genus
Family
Order
Class
Phylum
Kingdom

Most importantly - We will need to connect this table with existing IDs in the main infopdata ( which we are now moving to restructuring as profile) table.

For example, if you remove a wrong plant name from the original dataset, what happens to all the data that was connected to this one in the Main Tables?! We cannot afford to delete that.

Please discuss this in the next Skype call.

@petermr
Copy link
Collaborator Author

petermr commented Jul 17, 2019 via email

@EmanuelFaria
Copy link
Collaborator

@Shruthi-M
Hi Shruthi, would you please add me on Skype (Mannyrules) and Whatspp (+55 61 99675 3439) please?

Thanks!
Manny

@petermr
Copy link
Collaborator Author

petermr commented Jul 22, 2019

I believe that Taxize (TRNS) does NOT report synonymy. I entered Ocimum sanctum and Ocimum tenuiflorum and both reported they are Accepted.

Does EssoilDB V2.0 regard these as synonyms or distinct species?

This will drastically affect the numbers we report on the poster.

@vinitamehlawat
Copy link
Collaborator

Sir
As of now, they are distinct species.

@petermr
Copy link
Collaborator Author

petermr commented Jul 22, 2019 via email

@vinitamehlawat
Copy link
Collaborator

Dear Peter
I have made some changes on a1draft.pptx on top-most part i.e History and Introduction of EssOilDB & added a Profile table for Chemical compounds with that oil bottle.
Peter you also assigned me some work related to Wikidata Identifiers But I am not able to understand where should I put these IDs on Poster.
Here i am pasting these for you further reference.

  • Lantana camara (Q332469).
  • leaf (Q33971) / organ of a vascular plant, composing its foliage (very general term ).
  • flower (Q506) / structure found in some plants to support reproduction.
  • fruit (Q1364) / part of a flowering plant.

@petermr
Copy link
Collaborator Author

petermr commented Jul 24, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Jul 25, 2019

We should use GBIF to resolve synonyms.
Question.
Does it have an API?
What does it return?
If it is simple it could solve this problem quite quickly.
Vinita/Shruthi should report.

@Shruthi-M
Copy link
Collaborator

Shruthi-M commented Jul 25, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Jul 25, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Jul 25, 2019

I have been reading:
https://www.gbif.org/en/developer/species
which seems to provide what we want. Is this what you are using?

I'll copy some here:

Species API

http://api.gbif.org/v1/

I have issued:

api.gbif.org/v1/species?name=ocimum%20sanctum

and got:

{"offset":0,"limit":20,"endOfRecords":true,"results":[{"key":2927101,"nubKey":2927101,"nameKey":7681615,"taxonID":"gbif:2927101","sourceTaxonKey":143184691,"kingdom":"Plantae","phylum":"Tracheophyta","order":"Lamiales","family":"Lamiaceae","genus":"Ocimum","species":"Ocimum tenuiflorum","kingdomKey":6,"phylumKey":7707728,"classKey":220,"orderKey":408,"familyKey":2497,"genusKey":2874693,"speciesKey":2927100,"datasetKey":"d7dddbf4-2cf0-4f39-9b2a-bb099caae36c","constituentKey":"7ddf754f-d193-4cc9-b351-99906754a03b","parentKey":2874693,"parent":"Ocimum","acceptedKey":2927100,"accepted":"Ocimum tenuiflorum L.","scientificName":"Ocimum sanctum L.","canonicalName":"Ocimum sanctum","authorship":"L.","nameType":"SCIENTIFIC","rank":"SPECIES","origin":"SOURCE","taxonomicStatus":"SYNONYM","nomenclaturalStatus":[],"remarks":"","publishedIn":"Mant. pl. 1:85.  1767","numDescendants":0,"lastCrawled":"2018-06-20T14:41:51.801+0000","lastInterpreted":"2018-06-20T14:36:01.700+0000","issues":[
[many lines clipped]

Note the "taxonomicStatus":"SYNONYM" .

By contrast

api.gbif.org/v1/species?name=ocimum%20tenuiflorum

gives

"taxonomicStatus":"ACCEPTED"

suggesting that for Ocimum sanctum the accepted name is Ocimum tenuiflorum L.

We can automate this and save a huge amount of disambiguation work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants