-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Store LaTeX-free fields in BibEntry #2102
Conversation
LGTM. Although the converter is probably not perfect and we might run into new problems caused by that fact. |
The functionality in the converter has not changed, it is just a rewording of I still think the converter should be written from scratch, but that should really be a separate PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some nitpicking 😉 and a question:
Why does the "GrammarBasedSearchRule" not use this?
@@ -57,6 +58,16 @@ | |||
*/ | |||
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>(); | |||
|
|||
/* | |||
* Map that stores latex free versions of fields. Is populated as a cash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cache
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, will change
private final Map<String, String> latexFreeFields = new ConcurrentHashMap<>(); | ||
|
||
/* | ||
* Used to cleanse field values for internal LaTeX-free storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is wrong with cleanse? My dictionary says this is a proper term :-)
@@ -180,7 +191,7 @@ public String getId() { | |||
|
|||
/** | |||
* Sets the cite key AKA citation key AKA BibTeX key. | |||
* | |||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
@@ -191,7 +202,7 @@ public void setCiteKey(String newCiteKey) { | |||
|
|||
/** | |||
* Returns the cite key AKA citation key AKA BibTeX key, or null if it is not set. | |||
* | |||
* <p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, weird autoformatting...
Regarding |
Implementation alternatives for the LaTeX-to-Unicode conversions are discussed at #1252 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I join the nitpicking party. LGTM!
@@ -57,6 +58,16 @@ | |||
*/ | |||
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>(); | |||
|
|||
/* | |||
* Cache that stores latex free versions of fields. Is populated as a cash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second sentence doesn't make sense?!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right
@@ -479,6 +491,8 @@ public void setField(Map<String, String> fields) { | |||
|
|||
fields.remove(fieldName); | |||
fieldsAsWords.remove(fieldName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe also move fieldsAsWords.remove(fieldName);
to the invalidate cache method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! Done!
@@ -57,6 +58,16 @@ | |||
*/ | |||
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>(); | |||
|
|||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/**
instead of /*
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. It's strange that Intellij does not distinguish between JavaDoc and Inline documentation in its color.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the Darcula theme it does.
|
||
/* | ||
* Used to cleanse field values for internal LaTeX-free storage | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/**
instead of /*
As a few people have looked at the code and only criticized minor aspects (which I fixed), I'll merge this now. The codacy build seems to be broken somehow, but then it is also not that critical. |
* Add VM args for string deduplication * Add cache of LaTeX-free fields * Intern Strings for fields * Exchange calls of replaceAll with compiled regular expressions * Transform string.replace() with compiled regular expressions * Performance improvements in RegexBasedSearchRule * Add changelog entry * Javadoc fixes * Use LaTeX-free fields in GrammarBasedSearchRule * More JavaDoc cleanup * You can always improve JavaDoc * Bundle cache invalidation
@@ -750,4 +750,9 @@ return true;</string> | |||
</mediaSets> | |||
<buildIds buildAll="true" /> | |||
<buildOptions verbose="false" faster="false" disableSigning="false" disableJreBundling="false" debug="false" /> | |||
<jvmArguments> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if these settings go missing, it may caused by a plain save of the config using the Install4J GUI. I'm trying to add it as "VM options file" at 53ffd02, which seems to work.
Based on the discussion in #1993 and the work of @bartsch-dev in #2091, this PR aims to improve the speed of search by storing LaTeX-free versions of field values in a cache. It aims at a balance between better performance and an acceptable memory footprint. Basically, all ideas from #1993 are implemented: a cached store of LaTeX-free fields which is computed on demand, String internalization, and regex performance improvements in
LatexToUnicode
.Here are measurements with the new branch:
The memory footprint of a database with 6500 entries is ~ 912 MB on my machine
And on current master:
The memory footprint of the same 6500 entry database is ~ 888 MB.
All in all, this looks like a hefty performance improvement at little memory cost.