Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI matching in duplicate check #6897

Merged
merged 14 commits into from
Sep 29, 2020
Merged

DOI matching in duplicate check #6897

merged 14 commits into from
Sep 29, 2020

Conversation

Siedlerchr
Copy link
Member

@Siedlerchr Siedlerchr commented Sep 9, 2020

Fixes #6707

Follow up from #6756

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

KunAndrew and others added 6 commits August 8, 2020 16:36
Add zero weights for otes and comments.
small improvements
Moving cleanup's logic to DOI constructor.Add results of generate-authors.sh.
Moving cleanup's logic to DOI constructor.Add results of generate-authors.sh.
(revert AUTHORS)
 into KunAndrew-fix-for-issue-6707

* 'fix-for-issue-6707' of https://github.com/KunAndrew/jabref:
  issue 6707 Moving cleanup's logic to DOI constructor.Add results of generate-authors.sh. (revert AUTHORS)
  issue 6707 Moving cleanup's logic to DOI constructor.Add results of generate-authors.sh.
  issue 6707 small improvements
  issue 6707 Add zero weights for otes and comments.
  Add DOI matching with not full coincidence.
@koppor
Copy link
Member

koppor commented Sep 9, 2020

The JabRef maintainers will add the following name to the AUTHORS file. In case you want to use a different one, please comment here and adjust your name in your git configuration for future commits.

Andrew
Andrew Kuncevich

@Siedlerchr
Copy link
Member Author

@KunAndrew Your problems probably came from the fact that your repo's master branch is behind the naster branch of jabref.

  1. on the commandlne check that git remote -v has listed sth like upstream git@github.com:JabRef/jabref.git (can also the https link)
    If not then do:git remote add upstream git@github.com:JabRef/jabref.git(if you use ssh otherwise use the https link)
  2. git checkout master
  3. git merge upstream/master
    Then you master branch should be even with JabRef's upstream branch, so creating a new branch and creating a PR won't poroduce any conflicts

@Siedlerchr Siedlerchr added the status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers label Sep 9, 2020
@@ -108,12 +108,12 @@ public DOI(String doi) {
Matcher matcher = EXACT_DOI_PATT.matcher(trimmedDoi);
if (matcher.find()) {
// match only group \1
this.doi = matcher.group(1);
this.doi = matcher.group(1).replaceAll("[^\\w,/,:,-,.,-]|[_]", "");
Copy link
Member Author

@Siedlerchr Siedlerchr Sep 10, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KunAndrew I think you need to adjust the DOI regex pattern:
One CrossRef test fails, because of the missing underscore:
The DOI cannot be resolved wihtout underscore
org.opentest4j.AssertionFailedError: expected: <10.1007/11538394_20> but was: <10.1007/1153839420>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment added which characters are removed and why. Without any comment, this is a "magic number" which should be extraced to a constant.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed this addtional regex because it was not really making sense. The DOI class already trims DOIs and also removes other encoding components.

@tobiasdiez tobiasdiez added status: changes required Pull requests that are not yet complete and removed status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers labels Sep 10, 2020
Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small code comments

@@ -108,12 +108,12 @@ public DOI(String doi) {
Matcher matcher = EXACT_DOI_PATT.matcher(trimmedDoi);
if (matcher.find()) {
// match only group \1
this.doi = matcher.group(1);
this.doi = matcher.group(1).replaceAll("[^\\w,/,:,-,.,-]|[_]", "");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be a comment added which characters are removed and why. Without any comment, this is a "magic number" which should be extraced to a constant.

src/main/java/org/jabref/model/entry/identifier/DOI.java Outdated Show resolved Hide resolved
Siedlerchr and others added 5 commits September 26, 2020 13:45
Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
…-issue-6707

* upstream/master: (35 commits)
  Fix a fetcher test for the ShortDOIService (#6945)
  Fixes Shared Database: Changes filtering in CoarseChangeFilter to attribute property (#6868)
  Changed default value of "search and store files relative to bibtex file" to true (#6928)
  Replace comment by just a failure (#6943)
  Fix: in entry types editor selected field is not removed after first click  (#6941)
  Fix remove actions for entry types in the editor (#6933)
  Always use Java 15 (#6929)
  Update DevDocs: workaround for issues with local openjfx maven libraries (#6931)
  Fixes bugs in the `regex` cite key pattern modifier (#6893)
  Add missing author
  Readability for citation key patterns (#6706)
  Add new author
  Reset to master and add default case to switch (#6847)
  Bump mockito-core from 3.5.10 to 3.5.11 (#6924)
  Bump byte-buddy-parent from 1.10.14 to 1.10.15 (#6923)
  Bump org.beryx.jlink from 2.21.4 to 2.22.0 (#6925)
  Bump xmpbox from 2.0.20 to 2.0.21 (#6926)
  Bump pascalgn/automerge-action from v0.9.0 to v0.10.0 (#6927)
  Improve parsing of short DOIs (#6920)
  Bump junit-vintage-engine from 5.6.2 to 5.7.0 (#6910)
  ...
remove additional regex we already are trimming and normalizing DOIs
@Siedlerchr Siedlerchr added status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers and removed status: changes required Pull requests that are not yet complete labels Sep 26, 2020
@koppor koppor merged commit 8a242c9 into master Sep 29, 2020
@koppor koppor deleted the KunAndrew-fix-for-issue-6707 branch September 29, 2020 06:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Improvement of Duplicate Detection
5 participants