Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disambiguating chemistry and fixing typos #76

Open
petermr opened this issue Jul 12, 2019 · 133 comments
Open

Disambiguating chemistry and fixing typos #76

petermr opened this issue Jul 12, 2019 · 133 comments
Assignees
Labels

Comments

@petermr
Copy link
Collaborator

petermr commented Jul 12, 2019

Chemical nomenclature is complex and ambiguous. Any attempt to disambiguate MUST record ambiguity. Thus acetyl-furan could be 1-acetyl-furan or 2-acetyl-furan,
OPSIN (https://opsin.ch.cam.ac.uk) gives:

APPEARS_AMBIGUOUS: Connection of acet to furan

and this must be recorded

Always test with OPSIN.

@petermr
Copy link
Collaborator Author

petermr commented Jul 12, 2019

== create sample disambiguation of chemistry ==

  • create sample of 500 chemical names in EssoilDB. (Choose these throughout the alphabet).
  • create a CSV file for these, with the following columns:
  • EssoilDB name
  • OPSIN translation
  • OPSIN ambiguity / comment
  • Pubchem lookup
  • Pubchem comment
  • ChEBI lookup
  • ChEBI comment
  • Wikipedia lookup
  • Wikipedia comment
  • Wikidata lookup
  • Wikidata comment

For each lookup go to the site and lookup the name. Record the ID if found, else leave empty. If there are special comments record them.

This may be automatable through Egon's tools.

@petermr
Copy link
Collaborator Author

petermr commented Jul 13, 2019

Chemical disambiguation

The tools to use are:

java -jar [jarfile.jar] -o inchi namesin.txt inchiout.txt 

By using InChIs we have a correspondence between the systems.

INPUTS
From the CSV file output column 2 (common names). Edit out quotes (") and delete spaces round " - "; split esters "bornylacetate" => bornyl acetate.

OUTPUTS
If pubchem has an ambigous compound it outputs stereo isomers. These may need editing manually to give the commonest.

Typical example for
https://pubchem.ncbi.nlm.nih.gov/compound/42608158
shows which the most likely isomer is for alloaromadendrene (Allo-Aromadendrene)

@petermr
Copy link
Collaborator Author

petermr commented Jul 13, 2019

Scheduling chemical work

Vinita should supervise the processing, which will be largely carried out by Ambarish and later Shruthi.

It is particularly important to check correctness of results.

Method:
Divide the work into small batches (Pubchem may mandate this, but it's good practice). At this stage no more than 100 compounds per batch

0/ There should be a single communal table (as described). There may need to be more columns than specified there.
1/ run batch vs Pubchem to get (a) CIDs (b) InChIs. Add comments (c) where Pubchem has failed or is ambiguous.
2/ run batch on OPSIN to get (d) InChIs and (e) comments.

3/ search Wikidata with (a) CID (b) InChI (c) original name if fails . This should be done automatically .
For unambiguous compounds this will give a link to Wikidata that should be included in the EssoilDB database.

The correctness of the search will be shown by matching InChIs for numerous compounds. We will report early results in the poster.

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 15, 2019

Sir,
Please go through the Batch-0 run for the first 100 compounds.
compNameDisambiguation.csv(https://github.com/gilienv/EssOilDB/blob/master/tables/chemistry/compNameDisambiguation.csv) - output file for EssOilDB entry, PubChem lookups, OPSIN lookups and comments.

Wikidata entry is remaining right now.

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 15, 2019

Sir,
Please go through the files.
100cnamePubchemAndOPSIN.csv
100cnamePubChem.csv
100cnameOPSIN.csv

PubChem lookup generates isomers. Those are present into the file as output is generated (also order of PubChem lookup entries are same as of generated output.)

@petermr
Copy link
Collaborator Author

petermr commented Jul 15, 2019 via email

@EmanuelFaria
Copy link
Collaborator

I had better luck fixing chemical names with this:
https://www.ncbi.nlm.nih.gov/pcsubstance/?term=%22(Z)-BETA-OCIMENE%22

Not so much luck with this:
https://opsin.ch.cam.ac.uk/

@EmanuelFaria
Copy link
Collaborator

This one is pretty good too: https://www.ebi.ac.uk/chebi/searchId.do?chebiId=CHEBI:10447

@gilienv
Copy link
Owner

gilienv commented Jul 17, 2019

Thank you Manny!

Ambarish -
Requesting you to look up Manny's suggestions above and check how we fare in terms of Chemical Disambiguation

@gilienv
Copy link
Owner

gilienv commented Jul 17, 2019

As PMR had first pointed out - we need to document the KINDS of errors we have in the Chemistry.

At present, the most comprehensive assessment of types of errors has been conducted by Manny, and we have had a few meetings to discuss various issues.

More on my dropbox, but happy to add here if Ambarish initiates a list of Error types, along with V.1 entries for each kind

@petermr
Copy link
Collaborator Author

petermr commented Jul 17, 2019 via email

@ambarishK
Copy link
Collaborator

We generated 1000 records of compounds using OPSIN and PubChem. For getting WIKIDATA lookup column, we will have to reset the run. Preparing run for getting WIKIDATA and WIKIPEDIA lookups.

@petermr
Copy link
Collaborator Author

petermr commented Jul 18, 2019

Ambarish has made good progress on disambiguation - see
https://github.com/gilienv/EssOilDB/blob/master/chemistry/EssOilDBOPSINPubChem.tsv
This has lookups on trivial names (coname). (Trivial means "commonly used", not algorithmically parsable).
(@mannyrules note that OPSIN is created for systematic names and has a limited number of trivial names. By contrast Pubchem and ChEBI have a lot of trivial names but cannot parse systematic names that aren't in its database. So OPSIN+Pubchem/ChEBI should catch most.

@ambarishK and I had a good discussion today. The result in OPSINPubChem is:

  • we have some syntactic problems. Valid compounds are being rejected by OPSIN (run locally by Ambarish) and this may be due to spurious characters:
    ACTION please make the raw coname table available so we can check.
  • Ambarish has created unique persistent IDs for coname . These are essential to make sure we keep the corresondence between the Pubchem lookup and OPSIN. Following Wikidata and Pubchem these will be sequential (C1234 etc.)
  • There are many punctuation errors. These may have come from mis-entry or even badly punctuated compounds in the literature. Example
(2,4)-nonadienal	.. (2,4)-nonadienal'' is unparsable due to the following being uninterpretable: ''(2,4)-nonadienal'' 

An OPSIN-parsable name is 2,4-nonadienal

ACTION we will need (at least) three columns

  • original name. This is critical, in case it is actually correct in hindsight
  • cleaned name. This is a sister column, with the corrected name. In some cases this might be edited more than once. I suggest that we enter in this column only when at least one service can positively look it up.
  • name comments. Brief account of who cleaned the name and why. (e.g. removeBrackets petermr 20190720. We should try to create keywords e.g. removeBrackets.

The benefit is that @mannyrules and other volunteers (@petermr ) can edit this on a day-by-day basis without affecting the rest of the submission.

Both Pubchem and OPSIN produce InChIs if successful. We should find out as soon as possible when InChIs don't agree as this will probably be an important new problem.

@petermr
Copy link
Collaborator Author

petermr commented Jul 18, 2019

Created a new table EssOilDBOPSINPubChemInChI.csv with some columns removed and sorted. This is just for more rapid comparison of InChIs. Ignore it.

@EmanuelFaria
Copy link
Collaborator

I have been and will continue to relatively quickly replace errors in punctuation as well as “foreign” characters (eg, Ã, ã) etc.. I have also created a little table for myself where I am storing other, stranger anomalies such as things that look like spaces, but are actually some indescribable character.

Each time I find one, I save it so I can go through all of them “one last time” after the last person has touched the data.

I don’t know the cause of this strange data. It could be that we are each using different keyboard language settings, operating systems, or different dictionaries as default in our spreadsheet programs.

No matter though. I’m confident I can clean that stuff up.

My biggest limitation is not knowing what’s actually correct or incorrect. But on the other hand, my layman’s eyes see things others may miss, so together we’ll ferret out the weirdness.

Sent with GitHawk

@petermr
Copy link
Collaborator Author

petermr commented Jul 18, 2019

A very quick eyeball of InChIs

Of the 1000 names, approximately 700 were translated by PubChem and 400 by OPSIN (though there is still a punctuation problem and this number should increase.
There are 300 whcih have InChIs from both and I have only spotted 3-4 which are grossly different (mainly because OPSIN doesn't have the right systematica names (e.g.
terpinen-4-ol is a derivative of terpinene but OPSIN doesn't have this trivial name and translates it as ter[pinene]-4-ol - 3 pinenes stitched together. But generally OPSIN agrees with Pubchem ca 99% which is great. @vinitamehlawat we can report this figure.

@petermr
Copy link
Collaborator Author

petermr commented Jul 18, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 19, 2019

Table for name correction

I will start adding records after meeting today. Also, I will draft all possibilities of name inconsistencies with example.

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 19, 2019

Sir

I prepared a fresh sheet for name cleaning.

It containes exact 7162 unique compound records.

The we discussed today is there as it is. - https://github.com/gilienv/EssOilDB/blob/master/tables/chemistry/EssOilDBOPSINPubChemInChIs_A.csv

It has 7169 unique compound entries.

It is better to continue with the today discussed sheet.

I tried to get into the difference of 07 records. It may be because of repeated 07 compound names.

Documentation for generating sheet is at

@petermr
Copy link
Collaborator Author

petermr commented Jul 19, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 19, 2019

Sir

Check for the sheet EssOilDBOPSINPubChemInChIsANewFinal.csv.

It contains exactly same identifiers as of the first sheet (the finalised one) - EssOilDBOPSINPubChemInChIs_A.csv

Removing sheets - EssOilDBOPSINPubChemInChIsANew.csv and EssOilDBOPSINPubChemInChIsANew.tsv

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 22, 2019

We are clearly going to have to do manual correction of chemical names.
Common problems include:

  • misspelling
    e.g - 1,8-cineol
  • spaces included "alpha - pinene"
    e.g - 1,2,3,4-Tetrahydro-1,5,7-trimethyl naphthalene
  • spaces omitted "ethylacetate"
    e.g - (e)-sesquilavandulylacetate
  • hypens omitted/included
    e.g - 1,8 cineole
  • quotes (strange, unbalanced...)
    e.g - (2,4)-nonadienal
  • multiple locants

EssOilDB entry is "bergamotol acetate" but PubChem search shows - Trans-.alpha.-Bergamatol Acetate OR (Z)-.Alpha.-Bergamotol Acetate OR Cis-alpha-Bergamotol Acetate.

  • missing locants

e.g - borneole

  • Apart from them compound name is misspelled.

e.g - 1,4-cadinadienea

  • There is improper roman notation.

e.g humulene epoxide iii .It should have been humulene epoxide III.

  • There is inclusion of extra character.

e.g EssOilDBEntry - hexadecanoic0acid. It should have been Hexadecanoic acid.

  • Isomeric notations are mentioned into small letters. It should have been into caps one.

e.g - EssOilDBEntry is (2e)-octen-1-ol. It should have been E-2-octen-1-ol.

To be correct we should have at least 2 columns (raw data, curated data)

@petermr
Copy link
Collaborator Author

petermr commented Jul 22, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 23, 2019

Please go through the name cleaning sheet updated by me - Copy of EssOilDBOPSINPubChemInChIs_A.csv.

There is additional column for IUPAC name. I have added first 50 records into it.

I added a short description of today file.
documentation page

@ambarishK
Copy link
Collaborator

ambarishK commented Jul 23, 2019

Sir,

Compound_identifiers are now as C1,C2,C3 ......which corresponds to previous identifiers 1C, 2C, 3C ...... respectively.
Updated sheet with compound_identifier.

@petermr
Copy link
Collaborator Author

petermr commented Jul 23, 2019 via email

@ambarishK
Copy link
Collaborator

Sir,
I have listed WIKIDATA 'Q' ID for all compounds onto the poster. Please go through the page

@petermr
Copy link
Collaborator Author

petermr commented Jul 24, 2019 via email

@ambarishK
Copy link
Collaborator

Dear Sir

One pic I found in my mobile camera roll, It is of harvesting time ( of this March when I had visited my home ). Pic has Lantana camara shrubs spread at the bottom. If convenient, it can be included into the poster.

@petermr
Copy link
Collaborator Author

petermr commented Aug 27, 2019 via email

@ambarishK
Copy link
Collaborator

Yes sir.

@petermr
Copy link
Collaborator Author

petermr commented Aug 27, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Aug 28, 2019

I have manually edited the TSV file into HTML to include images and links to Wikidata. It's static.
At present Github does NOT render HTML; you have to download it, although it might be possible to view it as HTML from these suggestions:
https://stackoverflow.com/questions/8446218/how-to-see-an-html-page-on-github-as-a-normal-rendered-html-page-to-see-preview
Do not do anything to the current file.

resolveCompTable20190828.html

@petermr
Copy link
Collaborator Author

petermr commented Aug 28, 2019

I am VERY pleased with the resolution of compounds in the resolveCompTable20190828.html table. Well done.
As far as I can see there are only about 5-10 false positives (e.g SA3, NI) and I'll create a list to be removed.
@gilienv and @mannyrules please also check.

@ambarishK
Copy link
Collaborator

ambarishK commented Aug 28, 2019

Sir, even I prepared one table and added to repository now - compoundStructureDiagram.html but could not get an idea to embed images from repository folder. Current file has image path of my laptop i.e "E:/2D_images/*.png".

FPs would be only 07 - ni, sa2, sa3, sh1,sh2, sh3 and sh4. Also, there would be one repeat. Remove duplicated using cid column.

@ambarishK
Copy link
Collaborator

Sir, Table header for WIKIDATA is put into last most of the table. It should have been before cid column.

@petermr
Copy link
Collaborator Author

petermr commented Aug 28, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Aug 28, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 6, 2019

Sir, synonym table for compounds as well as plants are below.
compoundSynonymTable

column description

  • EID - EssoilDB unique ID for compound name.
  • SYNONYMS - compound synonym.

plantSynonymTable

column description

  • pid - EssoilDB unique ID for plant name.
  • synonyms - plant synonym.

Plant synonyms are from details.csv table.

@ambarishK
Copy link
Collaborator

Sir, all compound synonyms are from PubChem and plant synonyms are from details.csv. Plant synonyms are extracted using R-package-taxize from database - "col".

@petermr
Copy link
Collaborator Author

petermr commented Sep 6, 2019 via email

@ambarishK
Copy link
Collaborator

OK sir.

@petermr
Copy link
Collaborator Author

petermr commented Sep 6, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Sep 6, 2019 via email

@ambarishK
Copy link
Collaborator

sir, what is pid (property id) for compound synonyms?

@petermr
Copy link
Collaborator Author

petermr commented Sep 6, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 7, 2019

Yes sir, I assign synonym IDs to plant synonyms as well as compound synonyms. For example -

Compound synonym IDs                     EID
CS789                                                   C123
CS790                                                   C123

Similarly, I will have to assign plant IDs as per the mentioned syntax.

EssoilDB plant ID                           synonym ID
EP123                                                EPS9876
EP123                                                EPS9877

Previous I meant to ask how to get compound synonyms from wikidata. In SPARQL query I will have to pass on wikidata property value for compound synonyms like p662 as wikidata property for PubChem CID.

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 8, 2019

Sir, please go through synonym tables with added synonym IDs.
plant synonym table

Column description of plant synonym table is as follows.

  • EPSID - synonym ID assigned to plant synonym name.
  • EPID - Unique ID assigned to plant name (EssoilDB ID).
  • synonyms - plant synonym name.

For example.

EPSID	                       EPID                            	synonyms		
EPS1                               EP1	                               Abies alba apennina		
EPS2                               EP1	                               Abies alba pardei		
EPS3	                               EP1	                               Abies alba podolica		
EPS4                               EP1	                               Abies argentea		
EPS26	                       EP2	                               Abies apollinis		
EPS27	                       EP2	                               Abies borisii-regis pungenti-pilosa		
EPS28	                       EP2	                               Abies cilicica borisii-regis		
EPS29	                       EP3	                               Abies alba cephalonica		
EPS30                       	EP3	                               Abies apollinis		
EPS31	                       EP3	                               Abies apollinis		
EPS32	                       EP3	                               Abies apollinis panachaica				

compound synonym table

Column description of compound synonym table is as follows.

  • CSID - compound synonym ID.
  • EID - unique ID assigned to compounds (EssoilDB ID).
  • SYNONYM - compound sunonym name.
CSID            	EID	                  SYNONYM		
CS1          	C214	          acetate		
CS2	                C214	         Acetate Ion		
CS3	                C214	         Acetic acid, ion(1-)		
CS4           	C214	         Acetate ions		
CS5            	C214	         71-50-1	
CS431            	C215	         acetic aicd		
CS432	        C215	         acetic-acid		
CS433	       C215	                 Glacial acetate	
CS1160	      C2776	         ethanal		
CS1161	      C2776	         acetic aldehyde		
CS1162	      C2776	         ethyl aldehyde		
CS1163	      C2776	         75-07-0				

@petermr
Copy link
Collaborator Author

petermr commented Sep 8, 2019 via email

@petermr
Copy link
Collaborator Author

petermr commented Sep 8, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 9, 2019

Yes sir.
Should we go for applying regex over compound name synonyms?

I am going through details.csv. synonyms are not consistent with available synonym names in db="col". For example.

pid | normalied_name | synonyms
1159 | Ocimum americanum | Dracocephalum sibiricum,Glechoma   sibirica,Moldavica elata,Nepeta macrantha
1160 | Ocimum basilicum | NA
1161 | Ocimum basilicum | Nepeta purpurea
1162 | Ocimum basilicum | Nicotiana alba,Nicotiana alipes,Nicotiana attenuata,Nicotiana   capensis,Nicotiana caudata,Nicotiana chinensis,Nicotiana crispula,Nicotiana   florida,Nicotiana frutescens,Nicotiana fruticosa,Nicotiana gigantea,Nicotiana   gracilipes,Nicotiana guatemalensis,Nicotiana havanensis,Nicotiana   lancifolia,Nicotiana lancifolia,Nicotiana latissima,Nicotiana   lehmanni,Nicotiana lingua,Nicotiana loxensis,Nicotiana macrophylla,Nicotiana   marylandica,Nicotiana mexicana,Nicotiana pallescens,Nicotiana   petiolata,Nicotiana pilosa,Nicotiana serotina,Nicotiana tabaca,Nicotiana   turcica,Nicotiana verdon,Nicotiana ybarrensis,Tabacum latissimum,Tabacum   nicotianum,Tabacum ovatofolium
1163 | Ocimum basilicum | Erobathos damascenum,Melanthium damascenum,Nigella bourgaei,Nigella   coerulea,Nigella damascena africana,Nigella damascena minor,Nigella damascena   minor,Nigella damascena oligogyna,Nigella elegans,Nigella involucrata,Nigella   multifida,Nigella romana,Nigella taurica
1164 | Ocimum basilicum | NA
1165 | Ocimum basilicum? | Gymnadenia nigra longibracteata,Gymnigritella brachystachya,Gymnigritella   megastachya,Habenaria nigra,Nigritella angustifolia,Nigritella angustifolia   longibracteata,Nigritella brachystachya,Nigritella fragrans,Nigritella   hybrida,Nigritella megastachya,Nigritella nigra,Nigritella nigra   longibracteata,Nigritella suaveolens,Nigritella suaveolens   nigroconopsea,Orchis atropurpurea,Orchis moritziana,Orchis nigra,Orchis nigra   flore-rosea,Orchis reichenbachii,Orchis variegata,Satyrium nigrum,Sieberia   nigra
1166 | Ocimum basilicum | Baeckea linearis
1167 | Ocimum basilicum | Ocimum adscendens,Ocimum diffusum,Ocimum glaucum,Ocimum menthoides,Ocimum   viscosum,Orthosiphon diffusus,Orthosiphon hispidus,Orthosiphon   tomentosus,Orthosiphon tristis,Plectranthus viscosus
1168 | Ocimum ciliatum | Becium obovatum glabrior,Ocimum album,Ocimum brachiatum,Ocimum   canum,Ocimum canum integrifolium,Ocimum dichotomum,Ocimum dinteri,Ocimum   fluminense,Ocimum fruticulosum,Ocimum hispidulum,Ocimum incanescens,Ocimum   serpyllifolium glabrius,Ocimum stamineum,Ocimum thymoides,Becium obovatum   glabrior,Ocimum album,Ocimum brachiatum,Ocimum canum,Ocimum canum   integrifolium,Ocimum dichotomum,Ocimum dinteri,Ocimum fluminense,Ocimum   fruticulosum,Ocimum hispidulum,Ocimum incanescens,Ocimum serpyllifolium   glabrius,Ocimum stamineum,Ocimum thymoides
1169 | Ocimum gratissimum | Ocimum album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum   album,Ocimum basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum   citrodorum,Ocimum cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum   integerrimum,Ocimum lanceolatum,Ocimum laxum,Ocimum majus,Ocimum   medium,Ocimum minus,Ocimum nigrum,Ocimum odorum,Ocimum scabrum,Ocimum   simile,Ocimum thyrsiflorum,Ocimum urticifolium,Plectranthus barrelieri,Ocimum   album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum album,Ocimum   basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum citrodorum,Ocimum   cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum integerrimum,Ocimum   lanceolatum,Ocimum laxum,Ocimum majus,Ocimum medium,Ocimum minus,Ocimum   nigrum,Ocimum odorum,Ocimum scabrum,Ocimum simile,Ocimum thyrsiflorum,Ocimum   urticifolium,Plectranthus barrelieri
1170 | Ocimum gratissimum | Ocimum album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum   album,Ocimum basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum   citrodorum,Ocimum cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum   integerrimum,Ocimum lanceolatum,Ocimum laxum,Ocimum majus,Ocimum   medium,Ocimum minus,Ocimum nigrum,Ocimum odorum,Ocimum scabrum,Ocimum   simile,Ocimum thyrsiflorum,Ocimum urticifolium,Plectranthus barrelieri,Ocimum   album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum album,Ocimum   basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum citrodorum,Ocimum   cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum integerrimum,Ocimum   lanceolatum,Ocimum laxum,Ocimum majus,Ocimum medium,Ocimum minus,Ocimum   nigrum,Ocimum odorum,Ocimum scabrum,Ocimum simile,Ocimum thyrsiflorum,Ocimum   urticifolium,Plectranthus barrelieri
1171 | Ocimum gratissimum | Ocimum album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum   album,Ocimum basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum   citrodorum,Ocimum cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum   integerrimum,Ocimum lanceolatum,Ocimum laxum,Ocimum majus,Ocimum   medium,Ocimum minus,Ocimum nigrum,Ocimum odorum,Ocimum scabrum,Ocimum   simile,Ocimum thyrsiflorum,Ocimum urticifolium,Plectranthus barrelieri,Ocimum   album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum album,Ocimum   basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum citrodorum,Ocimum   cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum integerrimum,Ocimum   lanceolatum,Ocimum laxum,Ocimum majus,Ocimum medium,Ocimum minus,Ocimum   nigrum,Ocimum odorum,Ocimum scabrum,Ocimum simile,Ocimum thyrsiflorum,Ocimum   urticifolium,Plectranthus barrelieri
1172 | Ocimum gratissimum | Ocimum album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum   album,Ocimum basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum   citrodorum,Ocimum cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum   integerrimum,Ocimum lanceolatum,Ocimum laxum,Ocimum majus,Ocimum   medium,Ocimum minus,Ocimum nigrum,Ocimum odorum,Ocimum scabrum,Ocimum   simile,Ocimum thyrsiflorum,Ocimum urticifolium,Plectranthus barrelieri,Ocimum   album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum album,Ocimum   basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum citrodorum,Ocimum   cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum integerrimum,Ocimum   lanceolatum,Ocimum laxum,Ocimum majus,Ocimum medium,Ocimum minus,Ocimum   nigrum,Ocimum odorum,Ocimum scabrum,Ocimum simile,Ocimum thyrsiflorum,Ocimum   urticifolium,Plectranthus barrelieri
1173 | Ocimum kilimandscharicum | Ocimum album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum   album,Ocimum basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum   citrodorum,Ocimum cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum   integerrimum,Ocimum lanceolatum,Ocimum laxum,Ocimum majus,Ocimum   medium,Ocimum minus,Ocimum nigrum,Ocimum odorum,Ocimum scabrum,Ocimum   simile,Ocimum thyrsiflorum,Ocimum urticifolium,Plectranthus barrelieri,Ocimum   album,Ocimum anisatum,Ocimum barrelieri,Ocimum basilicum album,Ocimum   basilicum bullatum,Ocimum basilicum densiflorum,Ocimum basilicum   difforme,Ocimum basilicum glabratum,Ocimum basilicum majus,Ocimum basilicum   pelvifolium,Ocimum basilicum purpurascens,Ocimum basilicum   thyrsiflorum,Ocimum basilicum violaceum,Ocimum basilicum violocrispum,Ocimum   basilicum viridicrispum,Ocimum basilicum vulgare,Ocimum bullatum,Ocimum   caryophyllatum,Ocimum chevalieri,Ocimum ciliare,Ocimum ciliatum,Ocimum citrodorum,Ocimum   cochleatum,Ocimum dentatum,Ocimum hispidum,Ocimum integerrimum,Ocimum   lanceolatum,Ocimum laxum,Ocimum majus,Ocimum medium,Ocimum minus,Ocimum   nigrum,Ocimum odorum,Ocimum scabrum,Ocimum simile,Ocimum thyrsiflorum,Ocimum   urticifolium,Plectranthus barrelieri

I have extracted plant synonym names. synonym column can be augmented from that after mapping normalized names with that of available name into EssoilDB1.0.

Column description of the table is as follows.

  • plantNameFound - EssoilDB1.0 plant names whose synonyms are available into db="col".

  • synonyms - plant name synonyms.

All synonyms have been extracted using R taxize package.

R code snippet.

 > library(taxize)
> list<-c("Ocimum sanctum","Oryza sativa", "Millet")
> syn<-synonyms(list,db = "col")
> syn<-synonyms_df(syn)
> View(syn)

@petermr
Copy link
Collaborator Author

petermr commented Sep 9, 2019 via email

@ambarishK
Copy link
Collaborator

Yes sir.

@petermr
Copy link
Collaborator Author

petermr commented Sep 9, 2019 via email

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 9, 2019

Yes sir.

Sir, There is no compound synonym information into EssoilDB1.0. Header of chemical compound sheet is as follows.


Paper_code | Compound_name | CAS | Qnt | Plant_parts | Technique | Classification | Activity | Plant_type | Exp_Condition

There is CAS, Classification and Activity which are directly related with chemical compounds.

@ambarishK
Copy link
Collaborator

ambarishK commented Sep 10, 2019

Sir, please check for compound synonym names picked from EssoilDB1.0.

uniqueCompoundSynonym20190910.tsv

Each synonym is reported as one record per row.

Column description is as per follows.

  • CSID - Compound synonym ID.
  • EID - Unique identifiers assigned to each compound.
  • synonym - Compound synonym name.

Total number of records 2812

Example -

CSID	     EID	                 synonym
CS1	             C214	                    acetate
CS2	             C215	                    acetic acid
CS3	             C2776         	  acetaldehyde
CS4	             C170            	 3-hydroxy-2-butanone
CS5	             C170	                  3-hydroxybutan-2-one
CS6	             C170                      acetoin
CS7	             C2780                    acetone
CS8	             C298	                   benzaldehyde
CS9	             C3196                   benzene

@braaa11
Copy link

braaa11 commented Oct 30, 2023

Ggg

@braaa11
Copy link

braaa11 commented Oct 30, 2023

Error CS0433

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants