Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add missing sentence segmentation in funding and acknowledgement #1106

Merged
merged 35 commits into from
Jun 9, 2024

Conversation

lfoppiano
Copy link
Collaborator

@lfoppiano lfoppiano commented Apr 28, 2024

This PR implement fixes the way the funding-acknowledgment parser handles an already formatted statement and preserve their existing elements (references, sentences). #1090

This PR fixes the following problems:

  • sentence segmentation lost for funding and acknowlegment statements
  • reference markers are lost after the funding-acknowlegment parser is applied

The initial solution proposed in the #1090 discussion, to re-apply a sentence segmentation that act on the transformed TEI structure was not applicable because it was not possible to re-generate the sentence coordinates as the TEI-XML elements do not have anymore layout-token information.

NOTE: this should be merged after #1096

@coveralls
Copy link

coveralls commented Apr 28, 2024

Coverage Status

coverage: 40.787% (+0.6%) from 40.236%
when pulling bbca7dd on bugfix/sent-seg-ack-fund
into cb7118d on master.

Repository owner deleted a comment from github-actions bot May 1, 2024
@lfoppiano lfoppiano marked this pull request as ready for review May 5, 2024 06:10
@lfoppiano
Copy link
Collaborator Author

I did run all documents from PLOS, PMC and biorxiv over this, I've checked manually differences for a sample from each corpus. I also checked manually the merging of the coordinates for certain problematic documents that were causing problems in the past. It seems that everyhting looks good.

@lfoppiano lfoppiano added this to the 0.8.1 milestone May 21, 2024
@lfoppiano lfoppiano merged commit 694f0ed into master Jun 9, 2024
10 checks passed
@lfoppiano lfoppiano deleted the bugfix/sent-seg-ack-fund branch June 9, 2024 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants