-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature Request: Funding Information #652
Comments
Funding is indeed a frequent request! As you indicate, the new training data for header has been labelled with When the funding information is not in the header, the "funding" declaration is labelled for the moment in the segmentation model as an independent |
It would be good to be able to extract funding information.
Unfortunately, the bioRxiv XML itself doesn't contain specific annotations.
i.e. currently they are just mostly back sections.
We would like to extract also things like RRIDs.
Funding related information seem to be mostly contained near the acknowledgements, e.g.:
183988v1
(10.1101/183988)GROBID 0.6.1 XML
404632v1
(10.1101/404632)GROBID 0.6.1 XML
462929v1
(10.1101/462929)GROBID 0.6.1 XML
GROBID seems to generally group it under acknowledgements, as separate sub-sections. For the third example it failed to extract the text though.
Neither the bioRxiv XML nor GROBID seem to have specific annotations for funding.
Although the GROBID training data for the
header
model does contain examples for funding, e.g.12._10.1.1.56.103.training.header.tei.xml
:The text was updated successfully, but these errors were encountered: