Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New crawl of RCAAP citations using RCAAP API v.2 #1275

Closed
dcgomes opened this issue May 12, 2022 · 1 comment
Closed

New crawl of RCAAP citations using RCAAP API v.2 #1275

dcgomes opened this issue May 12, 2022 · 1 comment
Assignees
Labels
Component-DataAcquisition-CitationSaver Google Form, Python, API requests to external sources Impact-Medium Type-Enhancement
Milestone

Comments

@dcgomes
Copy link
Collaborator

dcgomes commented May 12, 2022

Periodically we have been crawling the citations contained in the open-access publications of the RCAAP network of OA repositories by parsing their exported sitemaps.

The sitemaps of the repositories hosted at FCCN (SARI service) are exported daily at 6h00.
For the other repositories we don't have control of their update frequency.

The RCAAP API v.2, to be released until August 2022, will enable automatically query all the Portuguese open-access repositories, even if they are hosted externally to FCCN (e.g. RepositoriUM).

RCAAP API v.2 will export links to PDFs and allow us to restrict the “published” date, so that we can select only the new records of articles published since the last crawl of the RCAAP citations.

The obtained URLs from RCAAP API should be ingested on the workflow of the new CitationSaver service(see also Issue #1147).

@dcgomes dcgomes added Impact-Medium Type-Enhancement Component-DataAcquisition-CitationSaver Google Form, Python, API requests to external sources labels May 12, 2022
@dcgomes dcgomes added this to the Godhelpus milestone May 12, 2022
@PedroG1515 PedroG1515 modified the milestones: Godhelpus, Helios Oct 25, 2022
@PedroG1515
Copy link

PedroG1515 commented Oct 12, 2023

Done.
Check the documentation: APISeedsGetter
Collection: EAWP43

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component-DataAcquisition-CitationSaver Google Form, Python, API requests to external sources Impact-Medium Type-Enhancement
Projects
None yet
Development

No branches or pull requests

2 participants