researcher_identifiers.Rmd

---
title: "Researcher Identifiers"
author: "Paul Oldham"
date: "28/06/2018"
output: html_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = TRUE, message = FALSE)
```

The use of researcher identifiers as part of research permit systems opens up the possibility of automating the retrieval of a researchers publications. This could form the basis for the creation of an electronic national repository of data on biodiversity research, or research in general.

Research identifiers are intended to overcome the problem of name ambiguity. That is the problem of distinguishing one John Smith or Jose Garcia from a researcher with the same name. The use of unique identifiers for a researcher goes some way to solving this problem.

A number of researcher identifiers are in operation from commercial and non-commercial sources. A review of identifiers is provided [here](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5451540/). Some of these are commercial, for example Clarivate Analytics offers [Researcher ID](https://clarivate.com/products/researcherid/) and Scopus has something similar. However, they are only of use to subscribers to these commercial databases. Google Scholar offers an ID but there is no Google Scholar API making it of no use for ABS monitoring. In practice, this means that the most practical solution is the ORCID identifier system.

### ORCID

ORCID is a researcher identifier system run on a non-profit basis. To date 4.9 million ORCID identifiers have been issued and take up of the system has been growing rapidly. 

ORCID is a unique numeric identifier that is linked to a public profile [https://orcid.org/0000-0002-1013-4390](https://orcid.org/0000-0002-1013-4390). The basic idea is that the ORCID identifier will distinguish between people of the same name. For example Paul Oldham with ORCID ID [https://orcid.org/0000-0002-1013-4390](https://orcid.org/0000-0002-1013-4390) shares the same basic name with Paul Oldham with ORCID ID [https://orcid.org/0000-0002-0628-5540](https://orcid.org/0000-0002-0628-5540). When searching a database the records for both Paul Oldhams would appear to belong to the same individual, when in practice they are separate persons. The ORCID ID distinguishes between them. 

An ORCID profile belongs to the researcher and consists of public and private elements. The researcher who holds the ORCID chooses what elements of the profile are public or private. 

### ORCID and ABS Monitoring

ORCID is important for ABS monitoring for three main reasons.

1. Organisations that are members of ORCID can allow users to log in to their systems using their ORCID credentials. This makes it easy for the user to use their ORCID id to manage sign in for funding organisations, publishers and potentially research permit applications. A growing number of funding organisations and publishers now offer ORCID sign in or require ORCID identifiers.

2. ORCID can be used to automate the retrieval of information from a users profile and their publications. This can save a researcher time in entering basic information. 

3. ORCID can be used to automate retrieval of publications and other records. The ORCID system also automatically updates profiles with publication records linked to the user using services such as Crossref. This data can then form part of a publicly accessible thematic or national repository of publications about a country. 

### The ORCID API

A fully [REST API](https://orcid.org/organizations/integrators/API) is available for ORCID and is divided into a [Public API](https://support.orcid.org/knowledgebase/articles/343182) and a [Members API](https://orcid.org/content/register-client-application-0) including a [Sandbox for testing](https://orcid.org/content/register-client-application-sandbox). A [Google Group](https://groups.google.com/forum/#!forum/orcid-api-users) is available for users to ask questions. 

For the full documentation see the [Github site](https://github.com/ORCID/ORCID-Source/blob/master/orcid-model/src/main/resources/record_2.0/README.md#read-sections)

### ORCID Workflows

The easiest way to get started is to use one of the [sample workflows](https://members.orcid.org/api/workflow)

Two workflows are particularly relevant for permit systems

1. [Funding Submission Systems](https://members.orcid.org/api/workflow/funding-systems) The closest to research permit systems

2. [Repository Systems](https://members.orcid.org/api/workflow/repository)

These Workflows basically consist of 

a) Adding an ORCID button to your site
b) Authentication (using three legged OAuth)
c) Gaining approval from the ID holder to read/write data to their profiles. For example, funding organisations may wish to write the details of an awarded grant to the profile. Note this step is handled during authentication by asking for approval for your app. 
d) Display the ID of the researcher within the system (so they know they are logged in)
e) Collect information from the ORCID profile (affiliation, publications etc)
f) Synchronize with ORCID for updates
g) Add information to ORCID profiles where appropriate.

### ORCID with Python

ORCID provides [ python-orcid](https://github.com/ORCID/python-orcid) as an API wrapper. Check that it is up to date when using. For memebrs a python script is also available for [public data sync](https://github.com/ORCID/public-data-sync)

### ORCID with R

ROpenSci has produced the [rorcid package](https://github.com/ropensci/rorcid) which is on CRAN. Note that when you first try to use the package a browser window will open up and ask you to authenticate. Close the browser after signing in.

```{r eval=FALSE}
install.packages("rorcid")
```

```{r eval=FALSE}
library(rorcid)
oldham <- works("0000-0002-1013-4390")
```

```{r eval=FALSE, echo=FALSE}
save(oldham, file = "oldham.rda")
```

```{r echo=FALSE}
load("oldham.rda")
```

The works function converts the list of data frames that is returned from ORCID into a single data frame to make your life easier. Note that there are a lot of columns and so you will normally want to select relevant columns. Also not that the ORCID ID used as input does not appear in the return itself (consistently that is) so you will probably want to add a column (tibble::add_column()).

```{r}
oldham %>%
  tibble::add_column(orcid_id = "0000-0002-1013-4390") %>%
  head() %>% # show ten
  knitr::kable()
```


### Biographical details (where present)

In some cases the bio is empty and will return NULL in the content field. In other cases it will return text in the content slot.

```{r eval=FALSE}
fictitious_bio <- orcid_bio("0000-0002-1825-0097")
```

```{r echo=FALSE, eval=FALSE}
save(fictitious_bio, file = "fictitious_bio.rda")
```

```{r echo = FALSE}
load("fictitious_bio.rda")
```

```{r}
fictitious_bio$`0000-0002-1825-0097`$content
```

### Employment details

```{r eval=FALSE}
carberry <- orcid_employments("0000-0002-1825-0097")
```

```{r eval=FALSE, echo=FALSE}
save(carberry, file = "carberry.rda")
```

```{r echo=FALSE}
load("carberry.rda")
```

```{r}
carberry$`0000-0002-1825-0097`$`employment-summary` %>%
  select(`department-name`, `role-title`, `organization.name`, `organization.address.city`, `organization.address.country`) %>% 
  knitr::kable()
```

### Education Details

```{r eval=FALSE}
oldham_education <- orcid_educations("0000-0002-1013-4390")
```

```{r eval=FALSE, echo=FALSE}
save(oldham_education, file = "oldham_education.rda")
```

```{r echo=FALSE}
load("oldham_education.rda")
```


```{r}
oldham_education$`0000-0002-1013-4390`$`education-summary` %>% 
  select(`department-name`, `role-title`, `organization.name`)
```


### Email Details

```{r eval=FALSE}
carberry_email <- orcid_email("0000-0002-1825-0097")
```

```{r echo=FALSE, eval=FALSE}
save(carberry_email, file = "carberry_email.rda")
```

```{r echo=FALSE}
load("carberry_email.rda")
```

```{r}
carberry_email$`0000-0002-1825-0097`$email
```