Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add plain citation parsing with LLM #11831

Merged
merged 14 commits into from
Oct 6, 2024

Conversation

InAnYan
Copy link
Collaborator

@InAnYan InAnYan commented Sep 25, 2024

Closes #11825
Closes #11805
Implements #11825 (comment).

Updated dialog for online parsing:
image

Updated dialog for offline parsing:
image

New setting:
image

Mandatory checks

  • Change in CHANGELOG.md described in a way that is understandable for the average user (if applicable)
    - [ ] Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked developer's documentation: Is the information available and up to date? If not, I outlined it in this pull request.
  • Checked documentation: Is the information available and up to date? If not, I created an issue at https://github.com/JabRef/user-documentation/issues or, even better, I submitted a pull request to the documentation repository.

@koppor
Copy link
Member

koppor commented Sep 25, 2024

"Parser choice" is nice.

  • always display that dropdown.
  • Remove "offline" / "online" in title
  • Add "offline" to dropdown
  • Adapt text ("Parser choice")
  • Add "Grobid" to Parser choice if enabled (GROBID enable din preference)
  • Add "LLM" to Parser choice if enabled (AI enabled in preferences)
  • Remove Dialog asking to enable Grobid

@Siedlerchr
Copy link
Member

I would propose a tab layout (e.g. like in OO style selection dialog) between Grobid/OpenAI

@koppor
Copy link
Member

koppor commented Sep 26, 2024

I would propose a tab layout (e.g. like in OO style selection dialog) between Grobid/OpenAI

The selection of the service is done very raRely. Thus. I Like the approach. The selection should persist across sessions.

@ThiloteE
Copy link
Member

Haven't tested the PR yet, but the screenshots show a divergence between the header of the window and the warning.

"Plain references parser (online)" vs "plain citation parsing"

It should not differ.

@ThiloteE
Copy link
Member

  • Also, I am not sure, if this preference should be in "Web search"
  • And if we use LLM's it can also be "online", right?

@koppor
Copy link
Member

koppor commented Sep 27, 2024

  • Also, I am not sure, if this preference should be in "Web search"

What do you mean by "this"?

Current state:

  • User can enable Grobid
  • Grobid preference is in "Web search"
    image
  • User can enable AI
  • This functionality makes use of Grobid and/or AI.
  • AI is used at other places.
  • Grobid functionality is also used at other places (e.g., importing a PDF into JabRef)
* And if we use LLM's it can also be "online", right?

Grobid is also online.

Therefore, I proposed to remove both "online" and "offline" and move that decision to the dropdown.

@ThiloteE
Copy link
Member

With "this" I meant the new preference that was showing in #11831 (comment) under web-search.

@ThiloteE
Copy link
Member

AI (LLMs) can be both online and offline.

@koppor
Copy link
Member

koppor commented Sep 29, 2024

AI (LLMs) can be both online and offline.

Therefore "online" and "offline" are requested to be removed. See #11831 (comment)

@koppor
Copy link
Member

koppor commented Sep 29, 2024

With "this" I meant the new preference that was showing in #11831 (comment) under web-search.

Since we don't have a tab "Import" and I don't think, this is so important, I propose it too keep it in the tab. However (!), please move it to the "Remote services" part. There, we already have Grobid-related stuff.

image

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

image

This is not GUI.

I also propose to change the name to RuleBasedBibtexExtractor.

This aligns very well with the history of natural lanuguage processing (NLP):

  • Rule-based methods (old BibtexExtractor).
  • Statistical methods (like GROBID, however this is not right, they can also use neural networks).
  • Neural-network methods (like LLM).

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

I also propose to change all names to "Plain Citation Parsing". Synonyms are:

  • "BibTex extractor" - it's too general
  • "Extract BibTex from plain text" - "plain text" - really? I think you meant a citation, not just an arbitrary text.
  • "Plain references parser" - another name. This is good actually, because it correlates to "References" section in papers. But I still propose to name it "citations", because that section could also be named "Bibliography". Citation is a very specific term and it depicts better what JabRef does.

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

WAIT

All of this time offline parsing did not support parsing of several entries (separated by two blank lines)? WTF

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

image

WTF? "JabRef server" - I thought it connected to some kind of GROBID server

UPDATE: Yes, it really connects to JabRef server. But I still renamed that to be more general

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

 Add "Grobid" to Parser choice if enabled (GROBID enable din preference)
Add "LLM" to Parser choice if enabled (AI enabled in preferences)

Ah, this is not that easy

@Siedlerchr
Copy link
Member

We had an instance of grobid running on azure but currently I cannot deploy it again due to some weird errors from docker

…dd-llm-citation-parsing

# Conflicts:
#	src/test/java/org/jabref/logic/importer/WebFetchersTest.java
@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 3, 2024

Did some crazy refactoring.

Hope you like it

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small nitpicks.

And

  • When opening the dialog, it should check whether the current value is still available. Example: GROBID enabled, dialog opened, GROBID selected, then preferences: GROBID disabled. Opening dialog again: GROBID should not be selected. (Note that GROBID is not in the dropdown, which is OK. However, it is selected)
    • I think, there should be a preference disabled-dependency: grafik. GROBID disabled and default GROBID: try AI. If AI not enabled, set default to rule-based. - similar to AI: If AI get disabled -> if default AI then check if GROBID enabled. Then set GROBID - else set local.
  • Link org.jabref.logic.importer.plaincitation.RuleBasedPlainCitationParser and org.jabref.logic.importer.fileformat.BibliographyFromPdfImporter#parseReference with a TODO (at both sides, with {@link}. They are semantically equal, but the code is different. Someone should unify the functionality.
    • Similar: Link org.jabref.logic.importer.plaincitation.SeveralPlainCitationParser and org.jabref.logic.importer.fileformat.BibliographyFromPdfImporter#getIntermediateData

Note: for reviewing this PR, refactoring miner is better.

E.g., instead of thinking a file was deleted, there is detection of major refactorings

grafik

src/main/resources/l10n/JabRef_en.properties Outdated Show resolved Hide resolved
src/main/resources/l10n/JabRef_en.properties Outdated Show resolved Hide resolved
src/main/resources/l10n/JabRef_en.properties Outdated Show resolved Hide resolved
@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 6, 2024

@koppor, I thought it will be easy to filter out parser choices

But we can't (currently, or without a hack) link preferences from different tabs.

At first, I used preferences.getAiPreferences().useAiProperty().addListener(...) (or not useAi, some other name), but when we edit preferences, we don't set them in CliPreferences...

@InAnYan
Copy link
Collaborator Author

InAnYan commented Oct 6, 2024

Okay, GROBID is on the same page, but AI is not.

Is a hack worth it?

@InAnYan InAnYan requested a review from koppor October 6, 2024 11:07
@koppor
Copy link
Member

koppor commented Oct 6, 2024

Okay, GROBID is on the same page, but AI is not.

Is a hack worth it?

I don't fully understand ^^. - I just guess: It is OK for me if the dropdown for default in the web search contains an entry which is available at another setting...

@koppor
Copy link
Member

koppor commented Oct 6, 2024

@koppor, I thought it will be easy to filter out parser choices
But we can't (currently, or without a hack) link preferences from different tabs.

It should be possible to get the AIPreferences passed.

At first, I used preferences.getAiPreferences().useAiProperty().addListener(...)

Sounds good!

(or not useAi, some other name), but when we edit preferences, we don't set them in CliPreferences...

I don't understand. Its a hiearchy - and the class for GuiPreferences is a superset of the CliPreferences

grafik

Copy link
Member

@koppor koppor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very small improvement. I think, this is NOT a dirty hack, but a good solution.

…Model.java

Co-authored-by: Oliver Kopp <kopp.dev@gmail.com>
@koppor koppor enabled auto-merge October 6, 2024 11:37
@koppor koppor added this pull request to the merge queue Oct 6, 2024
Merged via the queue into JabRef:main with commit daf6736 Oct 6, 2024
23 of 24 checks passed
@koppor koppor deleted the add-llm-citation-parsing branch October 6, 2024 11:51
@ThiloteE ThiloteE added the AI Related to AI Chat/Summarization label Oct 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
AI Related to AI Chat/Summarization
Projects
None yet
4 participants