Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store LaTeX-free fields in BibEntry #2102

Merged
merged 12 commits into from
Oct 3, 2016
Merged

Store LaTeX-free fields in BibEntry #2102

merged 12 commits into from
Oct 3, 2016

Conversation

lenhard
Copy link
Member

@lenhard lenhard commented Sep 30, 2016

Based on the discussion in #1993 and the work of @bartsch-dev in #2091, this PR aims to improve the speed of search by storing LaTeX-free versions of field values in a cache. It aims at a balance between better performance and an acceptable memory footprint. Basically, all ideas from #1993 are implemented: a cached store of LaTeX-free fields which is computed on demand, String internalization, and regex performance improvements in LatexToUnicode.

Here are measurements with the new branch:

Benchmark                             Mode  Cnt         Score        Error  Unit
Benchmarks.latexToUnicodeConversion  thrpt   20    119823.078   2063.425  ops/s
Benchmarks.parallelSearch            thrpt   20       735.890    113.685  ops/s
Benchmarks.search                    thrpt   20       397.807     28.557  ops/s

The memory footprint of a database with 6500 entries is ~ 912 MB on my machine

And on current master:

Benchmark                             Mode  Cnt         Score        Error  Unit
Benchmarks.latexToUnicodeConversion  thrpt   20     94367.718   2570.784  ops/s
Benchmarks.parallelSearch            thrpt   20        58.204       3.285  ops/s
Benchmarks.search                    thrpt   20        30.531       0.952  ops/s

The memory footprint of the same 6500 entry database is ~ 888 MB.

All in all, this looks like a hefty performance improvement at little memory cost.

  • Change in CHANGELOG.md described
  • Manually tested changed features in running JabRef

@lenhard lenhard added search bug Confirmed bugs or reports that are very likely to be bugs status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers labels Sep 30, 2016
@lenhard lenhard mentioned this pull request Sep 30, 2016
@stefan-kolb
Copy link
Member

LGTM. Although the converter is probably not perfect and we might run into new problems caused by that fact.

@lenhard
Copy link
Member Author

lenhard commented Sep 30, 2016

The functionality in the converter has not changed, it is just a rewording of String.replace calls into regular expressions that are compiled once instead of every time. Thus, there should be no new problems that weren't broken before anyway. You can never be sure, though...

I still think the converter should be written from scratch, but that should really be a separate PR.

Copy link
Member

@matthiasgeiger matthiasgeiger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nitpicking 😉 and a question:

Why does the "GrammarBasedSearchRule" not use this?

@@ -57,6 +58,16 @@
*/
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>();

/*
* Map that stores latex free versions of fields. Is populated as a cash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cache

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, will change

private final Map<String, String> latexFreeFields = new ConcurrentHashMap<>();

/*
* Used to cleanse field values for internal LaTeX-free storage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is wrong with cleanse? My dictionary says this is a proper term :-)

@@ -180,7 +191,7 @@ public String getId() {

/**
* Sets the cite key AKA citation key AKA BibTeX key.
*
* <p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

@@ -191,7 +202,7 @@ public void setCiteKey(String newCiteKey) {

/**
* Returns the cite key AKA citation key AKA BibTeX key, or null if it is not set.
*
* <p>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, weird autoformatting...

@lenhard
Copy link
Member Author

lenhard commented Sep 30, 2016

Regarding GrammarBasedSearchRule: I am not 100% sure, but it looks like it on the one hand builds upon ContainBasedSearchRule. But it also has some self-searching code, which I modified to use the LaTeX-free version. This was probably a bug.

@koppor
Copy link
Member

koppor commented Sep 30, 2016

Implementation alternatives for the LaTeX-to-Unicode conversions are discussed at #1252

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I join the nitpicking party. LGTM!

@@ -57,6 +58,16 @@
*/
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>();

/*
* Cache that stores latex free versions of fields. Is populated as a cash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second sentence doesn't make sense?!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right

@@ -479,6 +491,8 @@ public void setField(Map<String, String> fields) {

fields.remove(fieldName);
fieldsAsWords.remove(fieldName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe also move fieldsAsWords.remove(fieldName); to the invalidate cache method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! Done!

@@ -57,6 +58,16 @@
*/
private final Map<String, Set<String>> fieldsAsWords = new HashMap<>();

/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**instead of /*

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. It's strange that Intellij does not distinguish between JavaDoc and Inline documentation in its color.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the Darcula theme it does.


/*
* Used to cleanse field values for internal LaTeX-free storage
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/**instead of /*

@lenhard
Copy link
Member Author

lenhard commented Oct 3, 2016

As a few people have looked at the code and only criticized minor aspects (which I fixed), I'll merge this now. The codacy build seems to be broken somehow, but then it is also not that critical.

@lenhard lenhard merged commit 2e5dd7e into master Oct 3, 2016
@tobiasdiez tobiasdiez deleted the latex-free-fields branch October 3, 2016 08:43
zesaro pushed a commit to zesaro/jabref that referenced this pull request Nov 22, 2016
* Add VM args for string deduplication

* Add cache of LaTeX-free fields

* Intern Strings for fields

* Exchange calls of replaceAll with compiled regular expressions

* Transform string.replace() with compiled regular expressions

* Performance improvements in RegexBasedSearchRule

* Add changelog entry

* Javadoc fixes

* Use LaTeX-free fields in GrammarBasedSearchRule

* More JavaDoc cleanup

* You can always improve JavaDoc

* Bundle cache invalidation
@@ -750,4 +750,9 @@ return true;</string>
</mediaSets>
<buildIds buildAll="true" />
<buildOptions verbose="false" faster="false" disableSigning="false" disableJreBundling="false" debug="false" />
<jvmArguments>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that if these settings go missing, it may caused by a plain save of the config using the Install4J GUI. I'm trying to add it as "VM options file" at 53ffd02, which seems to work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Confirmed bugs or reports that are very likely to be bugs search status: ready-for-review Pull Requests that are ready to be reviewed by the maintainers
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants