-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use GROBID for extraction of metadata from PDFs #6158
Comments
Currently, it returns TEI XML format only, not BibTeX. I can try to patch the server accordingly. (Refs kermitt2/grobid#532 (comment)) I am curious in which cases GROBID is better than JabRef's custom implementation. It worked fine for me for IEEE and Springer LNCS. Still need to add more test cases though. |
Would be nice if you could change the server accordingly. Grobid is the defacto standard for metadata extraction from pdf (and is used by ResearchGate, Mendeley, etc). Our implementation was really naïve and only works for a few publisher. |
This issue has been inactive for half a year. Since JabRef is constantly evolving this issue may not be relevant any longer and it will be closed in two weeks if no further activity occurs. As part of an effort to ensure that the JabRef team is focusing on important and valid issues, we would like to ask if you could update the issue if it still persists. This could be in the following form:
Thank you for your contribution! |
This issue has been inactive for half a year. Since JabRef is constantly evolving this issue may not be relevant any longer and it will be closed in two weeks if no further activity occurs. As part of an effort to ensure that the JabRef team is focusing on important and valid issues, we would like to ask if you could update the issue if it still persists. This could be in the following form:
Thank you for your contribution! |
Mendeley gives junks. I havn't finished cleaning the junk Mendeley gave me 10 years ago. Using the system that Mendeley is using is really bad idea. It never gets it right.
|
JabRef now uses several sources for extracting metadata from PDF (XMP, embeded bibtex, DOI, Grobid) and allows comparing them Thank you for reporting this issue. We think, that is already fixed in our development version and consequently the change will be included in the next release. We would like to ask you to use a development build from https://builds.jabref.org/main and report back if it works for you. Please remember to make a backup of your library before trying-out this version. |
Fixed by #2838 |
c750b6e APA: Put conditional event-title logic in a macro (#6161) a87414f Remove month from association-for-compuational-linguistics.csl (#6158) 6153db0 Remove issue numbers from BJOC style (#6155) e231ea3 Bug fix for `event` regression (#6154) 0dab651 Add event-title to other APA styles (#6153) 698cf1c APA: `event-title` and conditional `event` (#6152) 58d3f8f Update vancouver-author-date.csl (#6148) f1638a9 add substitute to Vancouver author date (#6147) 39fede5 Update associacao-brasileira-de-normas-tecnicas.csl (#6138) fde7695 Include chapter title (#6140) 1e3d8b4 Update n.d. abbreivation for DGP style (#6136) ebb728b suffix '.' after first group; changed e-mail (#6135) eed4f07 Update and rename sciences-po-ecole-doctorale-note-french.csl to scie… (#6127) f194647 Delete TU Dresden Medizin as requested by library (#6131) d8423d8 Create entomological-review.csl (#6120) 064a394 Create australasian-journal-of-philosophy.csl (#6063) a998ded Add composer.json (#5668) 37083c9 Update copernicus-publications.csl (#6062) 694c97b Create chaucer review (#6061) 625a424 Create haffner-style-manual.csl (#6054) 8b7224b make annals-of-allergy-asthma-and-immunology independent (#6041) 710748c Create university-of-pretoria-harvard-theology-religion.csl (#6106) d16dffd Create health-physics.csl (#6040) ca9e184 Update style-manual-australian-government.csl (#6119) e412277 Create chemical-engineering-technology.csl (#6039) bebdb48 Create bibliothek-forschung-und-praxis.csl (#6038) 29e49cd Update nature.csl (#6117) 891897d fix short title for SBL (#6118) git-subtree-dir: buildres/csl/csl-styles git-subtree-split: c750b6e
Now that we have the GROBID server up and running, we can also use it to extract bibliographic metadata from PDFs.
https://grobid.readthedocs.io/en/latest/Grobid-service/
/api/processHeaderDocument
Old PR (using CERMINE instead of GROBID): #2474
The text was updated successfully, but these errors were encountered: