Skip to content

Commit

Permalink
Merge pull request #1046 from kermitt2/funders-and-funding
Browse files Browse the repository at this point in the history
Funders and funding
  • Loading branch information
kermitt2 authored Aug 28, 2023
2 parents 707030a + c384ff1 commit 25caaaf
Show file tree
Hide file tree
Showing 1,393 changed files with 476,710 additions and 1,596 deletions.
3 changes: 2 additions & 1 deletion build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -462,7 +462,8 @@ project(":grobid-trainer") {
"train_segmentation" : "org.grobid.trainer.SegmentationTrainer",
"train_reference_segmentation": "org.grobid.trainer.ReferenceSegmenterTrainer",
"train_ebook_model" : "org.grobid.trainer.EbookTrainer",
"train_patent_citation" : "org.grobid.trainer.PatentParserTrainer"
"train_patent_citation" : "org.grobid.trainer.PatentParserTrainer",
"train_funding_acknowledgement" : "org.grobid.trainer.FundingAcknowledgementTrainer"
]

def libraries = ""
Expand Down
2 changes: 0 additions & 2 deletions doc/Benchmarking-biorxiv.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,6 @@ Evaluation on 2000 random PDF files out of 2000 PDF (ratio 1.0).
|--- |--- |--- |--- |--- |
| availability_stmt | 0 | 0 | 0 | 0 |
| figure_title | 4.24 | 2.01 | 2.72 | 22978 |
| funding_stmt | 0 | 0 | 0 | 0 |
| reference_citation | 71.04 | 71.33 | 71.18 | 147470 |
| reference_figure | 70.59 | 67.74 | 69.13 | 47984 |
| reference_table | 48.12 | 83.06 | 60.94 | 5957 |
Expand All @@ -282,7 +281,6 @@ Evaluation on 2000 random PDF files out of 2000 PDF (ratio 1.0).
|--- |--- |--- |--- |--- |
| availability_stmt | 0 | 0 | 0 | 0 |
| figure_title | 69.47 | 32.89 | 44.65 | 22978 |
| funding_stmt | 0 | 0 | 0 | 0 |
| reference_citation | 83.03 | 83.37 | 83.2 | 147470 |
| reference_figure | 71.21 | 68.34 | 69.75 | 47984 |
| reference_table | 48.57 | 83.83 | 61.51 | 5957 |
Expand Down
2 changes: 2 additions & 0 deletions doc/Consolidation.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ Consolidation has two main interests:

* The consolidation service matches the extracted bibliographical references with known publications, and complement the parsed bibliographical references with various metadata, in particular DOI, making possible the creation of a citation graph and to link the extracted references to external services.

The consolidation includes the CrossRef Funder Registry for enriching the extracted funder information.

GROBID supports two consolidation services:

* [CrossRef REST API](https://github.com/CrossRef/rest-api-doc) (default)
Expand Down
11 changes: 6 additions & 5 deletions doc/Grobid-service.md
Original file line number Diff line number Diff line change
Expand Up @@ -111,24 +111,24 @@ Still to demostrate [PDF.js] annotation possibilities, by default bibliographica

We describe bellow the provided resources corresponding to the HTTP verbs, to use the grobid web services. All url described bellow are relative path, the root url is `http://<server instance name>/<root context>`

The consolidation parameters (`consolidateHeader` and `consolidateCitations`) indicate if GROBID should try to complete the extracted metadata with an additional external call to [CrossRef API](https://github.com/CrossRef/rest-api-doc). The CrossRef look-up is realized based on the reliable subset of extracted metadata which are supported by this API. Each consolidation parameter is a string which can have three values:
The consolidation parameters (`consolidateHeader`, `consolidateCitations`, `consolidateFunders`) indicate if GROBID should try to complete the extracted metadata with an additional external call to [CrossRef API](https://github.com/CrossRef/rest-api-doc) or [biblio-glutton](https://github.com/kermitt2/biblio-glutton). The CrossRef and biblio-glutton look-up are realized based on the reliable subset of extracted metadata which are supported by these API. Each consolidation parameter is a string which can have three values:

* `0`, means no consolidation at all is performed: all the metadata will come from the source PDF
* `1`, means consolidation against CrossRef and update of metadata: when we have a DOI match, the publisher metadata are combined with the metadata extracted from the PDF, possibly correcting them
* `2`, means consolidation against CrossRef and, if matching, addition of the DOI only
* `1`, means consolidation against CrossRef/biblio-glutton and update of metadata: when we have a DOI match, the publisher metadata are combined with the metadata extracted from the PDF, possibly correcting them
* `2`, means consolidation against CrossRef/biblio-glutton and, if matching, addition of the DOI only

### PDF to TEI conversion services

#### /api/processHeaderDocument

Extract the header of the input PDF document, normalize it and convert it into a TEI XML or [BibTeX] format.

`consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), or `2` (consolidate the citation and inject DOI only).
`consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), or `2` (consolidate the header metadata and inject DOI only).

| method | request type | response type | parameters | requirement | description |
|--- |--- |--- |--- |--- |--- |
| POST, PUT | `multipart/form-data` | `application/xml` | `input` | required | PDF file to be processed |
| | | | `consolidateHeader` | optional | consolidateHeader is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the citation and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted) . |
| | | | `consolidateHeader` | optional | consolidateHeader is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the header and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted) . |
| | | | `includeRawAffiliations` | optional | `includeRawAffiliations` is a boolean value, `0` (default, do not include raw affiliation string in the result) or `1` (include raw affiliation string in the result). |

Use `Accept: application/x-bibtex` to retrieve BibTeX format instead of TEI (note: the TEI XML format is much richer, it should be preferred if there is no particular reason to use BibTeX).
Expand Down Expand Up @@ -166,6 +166,7 @@ Convert the complete input document into TEI XML format (header, body and biblio
| POST, PUT | `multipart/form-data` | `application/xml` | `input` | required | PDF file to be processed |
| | | | `consolidateHeader` | optional | `consolidateHeader` is a string of value `0` (no consolidation), `1` (consolidate and inject all extra metadata, default value), `2` (consolidate the citation and inject DOI only), or `3` (consolidate using only extracted DOI - if extracted). |
| | | | `consolidateCitations` | optional | `consolidateCitations` is a string of value `0` (no consolidation, default value) or `1` (consolidate and inject all extra metadata), or `2` (consolidate the citation and inject DOI only). |
| | | | `consolidatFunders` | optional | `consolidateFunders` is a string of value `0` (no consolidation, default value) or `1` (consolidate and inject all extra metadata), or `2` (consolidate the funder and inject DOI only). |
| | | | `includeRawCitations` | optional | `includeRawCitations` is a boolean value, `0` (default, do not include raw reference string in the result) or `1` (include raw reference string in the result). |
| | | | `includeRawAffiliations` | optional | `includeRawAffiliations` is a boolean value, `0` (default, do not include raw affiliation string in the result) or `1` (include raw affiliation string in the result). |
| | | | `teiCoordinates` | optional | list of element names for which coordinates in the PDF document have to be added, see [Coordinates of structures in the original PDF](Coordinates-in-PDF.md) for more details |
Expand Down
8 changes: 8 additions & 0 deletions grobid-core/src/main/java/org/grobid/core/GrobidModels.java
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,9 @@ public enum GrobidModels implements GrobidModel {
ASTRO("astro"),
SOFTWARE("software"),
DATASEER("dataseer"),
//ACKNOWLEDGEMENT("acknowledgement"),
FUNDING_ACKNOWLEDGEMENT("funding-acknowledgement"),
INFRASTRUCTURE("infrastructure"),
DUMMY("none");

//I cannot declare it before
Expand Down Expand Up @@ -98,6 +101,11 @@ public String getFolderName() {
}

public String getModelPath() {
if (modelPath == null) {
File path = GrobidProperties.getModelPath(this);
if (path != null)
modelPath = path.getAbsolutePath();
}
return modelPath;
}

Expand Down
23 changes: 22 additions & 1 deletion grobid-core/src/main/java/org/grobid/core/data/Affiliation.java
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

import org.grobid.core.utilities.TextUtilities;
import org.grobid.core.lexicon.Lexicon;
import org.grobid.core.layout.LayoutToken;
import org.grobid.core.utilities.OffsetPosition;

import java.util.ArrayList;
import java.util.List;
Expand Down Expand Up @@ -33,9 +35,14 @@ public class Affiliation {

private boolean failAffiliation = true; // tag for unresolved affiliation attachment

private List<LayoutToken> layoutTokens = null;

// an identifier for the affiliation independent from the marker, present in the TEI result
private String key = null;

// default indo-european delimiters, should be moved to language specific analysers
public static String delimiters = " \n\t" + TextUtilities.fullPunctuations + "。、,・";

public Affiliation() {
}

Expand All @@ -56,6 +63,7 @@ public Affiliation(org.grobid.core.data.Affiliation aff) {
addrLine = aff.getAddrLine();
affiliationString = aff.getAffiliationString();
rawAffiliationString = aff.getRawAffiliationString();
layoutTokens = aff.getLayoutTokens();
}

public String getAcronym() {
Expand Down Expand Up @@ -300,6 +308,20 @@ public void setKey(String key) {
this.key = key;
}

public List<LayoutToken> getLayoutTokens() {
return this.layoutTokens;
}

public void setLayoutTokens(List<LayoutToken> tokens) {
this.layoutTokens = tokens;
}

public void appendLayoutTokens(List<LayoutToken> tokens) {
if (this.layoutTokens == null)
layoutTokens = new ArrayList<>();
this.layoutTokens.addAll(tokens);
}

public void clean() {
if (departments != null) {
List<String> newDepartments = new ArrayList<String>();
Expand Down Expand Up @@ -650,6 +672,5 @@ public String toString() {
", failAffiliation=" + failAffiliation +
'}';
}


}
Loading

0 comments on commit 25caaaf

Please sign in to comment.