Service that suggests article of interest to the reader based on their history or based on other readers with similar interests’ current reads. Also, when articles published in arXiv and by publisher both become available this service advices and scores their similarity for possible merging. These services are mostly for use by ADS.
$ virtualenv python
$ source python/bin/activate
$ pip install -r requirements.txt
$ pip install -r dev-requirements.txt
$ vim local_config.py # edit, edit
On your desktop run:
$ py.test
To get recommendation you do a POST request to the endpoint readhist:
https://api.adsabs.harvard.edu/v1/oracle/readhist
within the POST, payload can have the following optinal prameters function
, sort
, num_docs
, cutoff_days
, and top_n_reads
.
- Possible values for parameter
function
are:similar
,trending
,useful
, andreviews
. Default issimilar
. - Possible values for parameter
sort
are any sort options, the default isentry_date
. num_docs
is number of recommendation documents. Default is 5.cutoff_days
is number days going back from today for including published papers. Default is 5.top_n_reads
is number of articles read to consider. Default is 10.
For example payload of,
{"function": "similar", "sort": "entry_date", "num_docs": 5, "cutoff_days": 5, "top_n_reads": 10}
produces the query and returns:
{"bibcodes": "...", "query": "(similar(topn(10, reader:<reader>, entry_date desc)) entdate:[NOW-5DAYS TO *])"}
reader
is a 16-digit anonymous user id that gets extracted from the session id, hence for curl, for this endpoint, use your web API token.
Hence, to call oracle service /readhist
do
curl -H "Authorization: Bearer <your web API token>" -H "Content-Type: application/json" -X POST -d '{"function": "similar", "sort": "entry_date", "num_docs": 5, "cutoff_days": 5, "top_n_reads": 10}' https://api.adsabs.harvard.edu/v1/oracle/readhist
curl -H "Authorization: Bearer <your web API token>" -X GET https://api.adsabs.harvard.edu/v1/oracle/<function>/<reader>
note that reader
needs to be known to use the GET, which is extracted from the session id of the POST and returned in the response.
To find match for a publisher paper knowing metadata for arXiv paper and vice versa you do a POST request to the endpoint matchdoc
https://api.adsabs.harvard.edu/v1/oracle/matchdoc
within the POST, payload should have the following required prameters abstract
or title
, with author
and year
, and if there is a doi
, it can be included.
abstract
is the abstract of the paper to find a match for.title
is the title of the paper.author
is the list of authors, comma separated.year
is the published year of the paper.
To call oracle service /matchdoc
do
curl -H "Authorization: Bearer <your API token>" -H "Content-Type: application/json" -X POST -d '{"abstract": "<abstract text>", "title": "<title text>", "author": <"comma separated author list">, "year": "<year>", "doi": "doi"}' https://api.adsabs.harvard.edu/v1/oracle/matchdoc
and the API then responds in JSON with query that was executed and the list of bibcodes with their corresponding confidence, label(matched) (ie, if the system considers the two bibcodes to be a match, 1, or not a match, 0) and similarity scores between the abstract, title, author, and year of the two bibcodes.
{"query": "...", "match": [{"source_bibcode": "...", "matched_bibcode": "...", "confidence": 0.9140091, "matched": 1, "scores": {"abstract": 0.78, "title": 0.93, "author": 1, "year": 1}}]}
curl -H "Authorization: Bearer <your API token>" -X PUT https://api.adsabs.harvard.edu/v1/oracle/add -d @dataLinksRecordList.json -H "Content-Type: application/json"
where docMatchRecordList.json contains data in the format of protobuf structure DocMatchRecordList. Please see https://github.com/adsabs/ADSPipelineMsg/blob/master/specs/docmatch.proto for specification.
Note that add endpoint is being called from docmatching script, https://github.com/adsabs/docmatch_scripts/blob/master/to_oracle.py, where a tab delimited text file with 2 bibcode columns and an optional confidence score column in inputted, the data structure is then created and submitted to this endpoint.
curl -H "Authorization: Bearer <your API token>" -X DELETE https://api.adsabs.harvard.edu/v1/oracle/delete -d @docMatchRecordList.json -H "Content-Type: application/json"
To query matched documents database, you do a POST request to the endpoint query:
curl -H "Authorization: Bearer <your API token>" -X POST https://api.adsabs.harvard.edu/v1/oracle/query
and the API then responds in JSON with list of matched bibcodes with their corresponding confidence value.
{"params": {"rows": 2000, "start": 0, "date_cutoff": "1972-01-01 00:00:00+00:00"}, "results": [["2021arXiv210312030S", "2021CSF...15311505S", 0.9922985], ...]}
Note that the parameters are optional and if omitted then the first 2000 records are returned. To specify a range of records include start
and rows
:
curl -H "Authorization: Bearer <your API token>" -H "Content-Type: application/json" -X POST -d '{"rows":2, "start":0}' https://api.adsabs.harvard.edu/v1/oracle/query
Also query can be filtered by recent number of days:
curl -H "Authorization: Bearer <your API token>" -H "Content-Type: application/json" -X POST -d '{"rows":2, "start":0, "days":3}' https://api.adsabs.harvard.edu/v1/oracle/query
returning
{"params": {"rows": 2, "start": 0, "days": 3, "date_cutoff": "2022-08-25 19:06:15.359667+00:00"}, "results": [...]}
Also Note that there is a limit of number of records per call, 2000, if rows is set to a larger value, still only 2000 records are returned.
curl -H "Authorization: Bearer <your API token>" -X GET https://api.adsabs.harvard.edu/v1/oracle/cleanup"
curl -H "Authorization: Bearer <your API token>" -X GET https://api.adsabs.harvard.edu/v1/oracle/list_tmps"
curl -H "Authorization: Bearer <your API token>" -X GET https://api.adsabs.harvard.edu/v1/oracle/list_multis"
Golnaz