Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store Citation Relations in LRU Cache #10980

Merged
merged 5 commits into from
Mar 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
- We made the command "Push to TexShop" more robust to allow cite commands with a character before the first slash. [forum#2699](https://discourse.jabref.org/t/push-to-texshop-mac/2699/17?u=siedlerchr)
- We only show the notification "Saving library..." if the library contains more than 2000 entries. [#9803](https://github.com/JabRef/jabref/issues/9803)
- We enhanced the dialog for adding new fields in the content selector with a selection box containing a list of standard fields. [#10912](https://github.com/JabRef/jabref/pull/10912)
- We store the citation relations in an LRU cache to avoid bloating the memory and out-of-memory exceptions. [#10958](https://github.com/JabRef/jabref/issues/10958)
- Keywords filed are now displayed as tags. [#10910](https://github.com/JabRef/jabref/pull/10910)

### Fixed
Expand Down
Original file line number Diff line number Diff line change
@@ -1,16 +1,18 @@
package org.jabref.gui.entryeditor.citationrelationtab;

import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.identifier.DOI;

import org.eclipse.jgit.util.LRUMap;

public class BibEntryRelationsCache {
private static final Map<String, List<BibEntry>> CITATIONS_MAP = new HashMap<>();
private static final Map<String, List<BibEntry>> REFERENCES_MAP = new HashMap<>();
private static final Integer MAX_CACHED_ENTRIES = 100;
private static final Map<String, List<BibEntry>> CITATIONS_MAP = new LRUMap<>(MAX_CACHED_ENTRIES, MAX_CACHED_ENTRIES);
private static final Map<String, List<BibEntry>> REFERENCES_MAP = new LRUMap<>(MAX_CACHED_ENTRIES, MAX_CACHED_ENTRIES);

public List<BibEntry> getCitations(BibEntry entry) {
return CITATIONS_MAP.getOrDefault(entry.getDOI().map(DOI::getDOI).orElse(""), Collections.emptyList());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens for entries which are not likely to be viewed, but are requested? I think, the hashmap then returns an empty list. Thus, we loose information.

Shouldn't the references be recaluclated then (instead of returning Collections.emptyList())? But how? (I am not that deep into the code)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the entry is requested but no longer in cache, the references and citations are recalculated.
This behaviour is the same as before, the only change is that the number of references and citations stored is now limited.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still think we need to use an MVStore, then I'd need some more info on how and why.
At the moment I'm having trouble seeing the problem and the benefits of MVStore.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the entry is requested but no longer in cache, the references and citations are recalculated.

I needed to go into the code and understand if for myself. It would have been nice to guide me to org.jabref.gui.entryeditor.citationrelationtab.BibEntryRelationsRepository#needToRefreshCitations.

I implemented a test case showing that it works: #10983

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still think we need to use an MVStore, then I'd need some more info on how and why. At the moment I'm having trouble seeing the problem and the benefits of MVStore.

Do you know about "rate limits" and blocking users for too many requests?

Quoting https://libguides.ucalgary.ca/c.php?g=732144&p=5260798

The API allows up to 100 requests per 5 minutes. To access a higher rate limit, complete the form to request authentication for your project.

That means, for a large library, I cannot step through the references, because I could hit the rate limit. I know, this is seldom, but it could happen at following setting: In a corporate setting: all requests are going through a proxy. Thus, the rate limit is not per person, but per the SUM of persons. In case 100 researchers work in parallel with JabRef, each researcher can get ONE request per 5 minutes. And per entry TWO requests are needed: For the citing and the cited by.

Let's investiage MVStore. MVStore is a library storing the values of a hashmap on disk. Thus, NOT in memory. Thus, it takes less memory than a full hash map in memory, because it is on disk. -- MVStore routes through the request to a map entry to disk. - See https://www.h2database.com/html/mvstore.html for details.


We can merge as is, but we should work on MVStore fast. Otherwise, companies with a corporate proxy (and there are many companies using one) will not be able to use that feature of JabRef any more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried out the feature and I usually get http 429 and cannot see any citations...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you still think we need to use an MVStore, then I'd need some more info on how and why. At the moment I'm having trouble seeing the problem and the benefits of MVStore.

Do you know about "rate limits" and blocking users for too many requests?

Quoting https://libguides.ucalgary.ca/c.php?g=732144&p=5260798

The API allows up to 100 requests per 5 minutes. To access a higher rate limit, complete the form to request authentication for your project.

That means, for a large library, I cannot step through the references, because I could hit the rate limit. I know, this is seldom, but it could happen at following setting: In a corporate setting: all requests are going through a proxy. Thus, the rate limit is not per person, but per the SUM of persons. In case 100 researchers work in parallel with JabRef, each researcher can get ONE request per 5 minutes. And per entry TWO requests are needed: For the citing and the cited by.

Let's investiage MVStore. MVStore is a library storing the values of a hashmap on disk. Thus, NOT in memory. Thus, it takes less memory than a full hash map in memory, because it is on disk. -- MVStore routes through the request to a map entry to disk. - See https://www.h2database.com/html/mvstore.html for details.

We can merge as is, but we should work on MVStore fast. Otherwise, companies with a corporate proxy (and there are many companies using one) will not be able to use that feature of JabRef any more.

Ok, I see your point now.
I was not aware that we had such a limited number of requests and that companies work with a singular request pool.
Then it might be best to wait until this is done properly, as the LRU cache would only help a small number of users but could greatly limit company users' experience.
What do you think @koppor?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cardionaut Thank you for coming back on this.

Proposal:

  1. Merge as is. Think, 100 entries is a good choice
  2. Work on MVStore in a follow-up PR 😅. Estimate: 100 lines of code, but scattered around JabRef. NativeDesktop needs to be touched etc. The most difficult thing will be the closing of the MVStore. Since the DOIs are globally unique, one can close the MVStore when JabRef is shut down. This makes it "easier" (in comparison to Add FileMonitor for LaTeX citations #10937, where for each tab some closing thing were necessary). -- Nevertheless, it could be that this will be a back-and-forth code development (meaning: code reviews with significant changes could come pu). I hope, you can invest the time and energy in this @cardionaut. That feature would really help to make the citation relations really usable. (Because the information for each DOI is stored independent of each library and is presented as soon it is availbable.)... (Follow-up requirement: Refresh the DOI information if one week passed since the last fetch. Maybe this can be baked into the HashMap designed for the MVStore). -- Implementation hint: NOT doing it like org.jabref.logic.journals.JournalAbbreviationRepository, because there, there is no direct access to the MVStore, but new hashmaps are created.

Copy link
Contributor Author

@cardionaut cardionaut Mar 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@koppor Great, sounds like a plan.
I can not make any promises on how quickly I can get this done but I will try my best.
I work full-time and am still quite new to Java.
I'll set up a draft PR as soon as I have made some progress.

Expand Down
Loading