Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Requirements for Organization Dictionary #3

Open
ShweataNHegde opened this issue Dec 2, 2020 · 7 comments
Open

Requirements for Organization Dictionary #3

ShweataNHegde opened this issue Dec 2, 2020 · 7 comments

Comments

@ShweataNHegde
Copy link
Collaborator

The Current Version of the Dictionary

The organization dictionary needs updates. Here is the list of requirements:

  1. We would want a DictionaryEditor to reference "country" as a related item. Something similar to this:
<entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
  <synonym>Samsung chaebol</synonym>
  <synonym>Samsung Group</synonym>
  <related role="country" wikidataID="Q884">South Korea</related>
   <related role="crossrefid" wikidataID="">100004358</related>  </entry>
  1. There are duplicate entries in the dictionary. For example, if an organization has two or more CrossRef IDs, each of them gets a separate entry in the dictionary. We would like to have a Python tool which goes through the dictionary looking for entries with same Wikidata ID and merges into one.
@petermr
Copy link
Owner

petermr commented Dec 3, 2020

To add the query into dictionary:

# Organization
SELECT ?OrganizationLabel ?Country ?CountryLabel ?instanceofLabel  ?Organization ?crossrefid 

WHERE {
  ?Organization wdt:P3153 ?crossrefid .
OPTIONAL  {?Organization wdt:P31 ?instanceof .}
  ?Organization wdt:P17 ?Country .
  
 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 20000000

@petermr
Copy link
Owner

petermr commented Dec 3, 2020

This has a RESTful API/URL: with URLencoding


(broken up for readability - don't use this)

https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3FOrganizationLabel%20%3F
Country%20%3FCountryLabel%20%3FinstanceofLabel%20%20%3FOrganization%20%3F
crossrefid%20%0A%0AWHERE%20%7B%0A%20%20%3FOrganization%20wdt%3AP3153%20%3F
crossrefid%20.%0AOPTIONAL%20%20%7B%3FOrganization%20wdt%3AP31%20%3F
instanceof%20.%7D%0A%20%20%3FOrganization%20wdt%3AP17%20%3F
Country%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3A
serviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000


We can add this to the dictionary : suggestion
https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3FOrganizationLabel%20%3FCountry%20%3FCountryLabel%20%3FinstanceofLabel%20%20%3FOrganization%20%3Fcrossrefid%20%0A%0AWHERE%20%7B%0A%20%20%3FOrganization%20wdt%3AP3153%20%3Fcrossrefid%20.%0AOPTIONAL%20%20%7B%3FOrganization%20wdt%3AP31%20%3Finstanceof%20.%7D%0A%20%20%3FOrganization%20wdt%3AP17%20%3FCountry%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000

@petermr
Copy link
Owner

petermr commented Dec 3, 2020

Can also use a shortened query:

https://w.wiki/p3k

We don't know how persistent this will be

@petermr
Copy link
Owner

petermr commented Dec 3, 2020

Should also add the WD property for the related items

<entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
  <synonym>Samsung chaebol</synonym>
  <synonym>Samsung Group</synonym>
  <related roleWikidataID="P17" role="country" wikidataID="Q884">South Korea</related>
   <related roleWikidataID="P3153" role="crossrefid" wikidataID="">100004358</related>  </entry>

@petermr
Copy link
Owner

petermr commented Dec 3, 2020

This extends to animal hosts for zoonosis

(Mockup please change to correct values

<entry  description="COVID-19" name="COVID-19" term="COVID-19" wikidataURL="http://www.wikidata.org/entity/Q00000" wikipediaURL="https://en.wikipedia.org/wiki/COVID0000" wikidataID="Q00000">
  <synonym>COVID-Sars2-Cov</synonym>
  <related roleWikidataID="P2975" role="host" wikidataID="Q2000632">Rhinolophus Bat</related>

start of query:

# Organization
SELECT ?organism ?organismLabel ?host ?hostLabel 

WHERE {
  ?organism wdt:P2975 ?host .
  
 
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 20000000

and in REST:

https://query.wikidata.org/#%23%20Organization%0ASELECT%20%3Forganism%20%3ForganismLabel%20%3Fhost%20%3FhostLabel%20%0A%0AWHERE%20%7B%0A%20%20%3Forganism%20wdt%3AP2975%20%3Fhost%20.%0A%20%20%0A%20%0A%20%20SERVICE%20wikibase%3Alabel%20%7B%20bd%3AserviceParam%20wikibase%3Alanguage%20%22en%22.%20%7D%0A%7D%0ALIMIT%2020000000

@petermr
Copy link
Owner

petermr commented Dec 3, 2020

Searching for related in dictionaries

<entry  description="South Korean multinational conglomerate" name="Samsung" term="Samsung" wikidataURL="http://www.wikidata.org/entity/Q20716" wikipediaURL="https://en.wikipedia.org/wiki/Samsung" wikidataID="Q20716">
  <synonym>Samsung chaebol</synonym>
  <synonym>Samsung Group</synonym>
  <related roleWikidataID="P17" role="country" wikidataID="Q884">South Korea</related>
   <related roleWikidataID="P3153" role="crossrefid" wikidataID="">100004358</related>  </entry>

"All organizations in South Korea"
We will use XPath

"all entry with related child with role of P17 and wikidataID of Q884 "

XPath:

/*/entry[related[@roleWikidataID='P17' and @wikidataID='Q884']]"

Python elementTree has LIMITED XPath

@petermr
Copy link
Owner

petermr commented Jan 21, 2021

requirements arising from old dictionaries (automation)

  • merge entries with same WikidataID
  • detect and eliminate scholarly articles, books, etc.
  • add language wikipedia pages from wikidataID
  • (SH) post-SPARQL filtering, or query refinement
  • translate attributes into wikidata properties where possible (crossrefid => _p3153_crossrefid)
  • remove unwanted terms (term value or wikidataID)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants