Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Additional Short DOI fatal exception cases: java.lang.IllegalArgumentException: <string> is not a valid DOI/Short #7127

Closed
1 task done
koobs opened this issue Nov 27, 2020 · 3 comments · Fixed by #7191
Closed
1 task done

Comments

@koobs
Copy link

koobs commented Nov 27, 2020

JabRef 5.2--2020-11-26--f1a2fa7
Windows 10 10.0 amd64 
Java 14.0.2

Summary

JabRef produces fatal exceptions for files containing non-DOI-related strings such as 10:51 (a timestamp) and 10/B(C)/15 (an arbitrary designation/ID)

The ShortDOI parsing subsystem was improved in #6920 to fix failure cases, but it appears there are additional cases and strings (probably an arbitrarily high number) that produces fatal exceptions.

Given the issues associated with arbitrary strings in arbitrary documents, I suspect it is unlikely sustainable in the long-term to fixed pattern match, particularly if the behaviour of the system for failing cases, remains a fatal exception from which the user must manually recover (ie: identify the document containing the string, and exclude it from import).

I propose the behaviour be changed to fall-through (not fail). If it is desirable to not lose the failing semantics, files/entries may potentially be with a note or status that the parsing resulted in a null result, though I'm not sure that is particularly valuable.

Steps to reproduce the behavior

  1. Prepare local PDF files with contents that contain strings that produce exceptions (see below)
  2. Create New library
  3. Run Tools -> Search for unlinked local files
  4. Browse to folder containing local files -> Scan -> Import

Log Files

Log File
java.lang.IllegalArgumentException: 10/B(C)/15 is not a valid DOI/Short DOI.
	at org.jabref@5.2.298/org.jabref.model.entry.identifier.DOI.<init>(Unknown Source)
	at org.jabref@5.2.298/org.jabref.model.entry.identifier.DOI.findInText(Unknown Source)
	at org.jabref@5.2.298/org.jabref.logic.importer.fileformat.PdfContentImporter.importDatabase(Unknown Source)
Log File
java.lang.IllegalArgumentException: 10:51 is not a valid DOI/Short DOI.
	at org.jabref@5.2.298/org.jabref.model.entry.identifier.DOI.<init>(Unknown Source)
	at org.jabref@5.2.298/org.jabref.model.entry.identifier.DOI.findInText(Unknown Source)
	at org.jabref@5.2.298/org.jabref.logic.importer.fileformat.PdfContentImporter.importDatabase(Unknown Source)
@PremKolar
Copy link
Contributor

Hi,
I wrote the short-doi improvement you mentioned.
The strings 10/B(C)/15 and 10:51 by themselves should not be interpreted as short dois anymore. I will look into it and try to find out what's wrong.

@PremKolar
Copy link
Contributor

I have detected the problem. Line 78 in DOI.java:

+ "(?:\\s/?)" // /10/ab12 or 10/ab12 (but not eg "2020/10/ab12")

I will create a pull request...

PremKolar added a commit to PremKolar/jabref that referenced this issue Dec 14, 2020
@PremKolar PremKolar mentioned this issue Dec 14, 2020
5 tasks
@koobs
Copy link
Author

koobs commented Dec 15, 2020

Thank you @PremKolar, I'll test latest master with #7191 applied to confirm resolution

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants