-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to latex2unicode lib instead of own handling #2465
Comments
Or: #1252 |
I set up a working implementation using latex2unicode in the latex2unicode branch. Here is the diff: https://github.com/JabRef/jabref/compare/latex2unicode I had some intial setup problems, but the author helped me to resolve that in tomtung/latex2unicode#1 The library is really easy to use and would be great in terms of code complexity. It also fixes a few conversion problems we are having. However, here are the results from the benchmark: On current master:
Using latex2unicode
This is a drop in op/s by two orders of magnitude. It is also noticeable when starting up JabRef. I hate to say this since I really like the library, but it looks as if we cannot use it due to performance. |
Did you tried the benchmark without the normalizer and the replacement of tildes? Maybe these post-and preconversations are a problem.
|
I did not try the benchmark without normalizer and replacement of tildes. Could you maybe check out the branch and do a benchmark? Maybe this is just a weird configuration issue on my system (although I don't think so). The single regular expression with the tildes will not make a difference. We have several of those in our current converter. The normalization cannot be avoided, otherwise the output of latex2unicode will not be usable for JabRef. This is discussed in the issue linked above. I guess (that is really just a guess) that the performance drop comes from the fact that latex2unicode is implemented in Scala and we are using it in Java. No matter how close Scala is integrated with Java, the friction inevitably will cause an overhead. I aggree that a structure similar to the implementation of latex2unicode would be desirable, but we have to acknowledge that this corresponds to a rewrite. This should not be done purely for the sake of beauty, but when fixing one of the issues linked above. |
@tomtung wants to do some optimization in latex2unicode, see: tomtung/latex2unicode#1 (comment) We will try a new version of the library when it is available. |
So @tomtung did some optimization and I did a new performance benchmark, the results are as follows:
This is a huge boost from the previous version, although it is not up to the level that we have with our own conversion. Nevertheless, I would strongly suggest to go for it. I did not notice a significantly longer delay in opening JabRef, even when opening Aegits gigantic file. What I see as critical is that the library gives us a much more complete Latex to unicode conversion that fixes a number of bugs which we currently have listed in our issue tracker. And it even has potential for much more than that. For instance, we can now correctly convert italics (which cannot easily be displayed, so there is not much we can do with it right now, but anyway). The code can be viewed as part of #2532. @JabRef/developers What do you think? |
I would vote for a go. The performance seems to be acceptable
Am 09.02.2017 10:05 vorm. schrieb "Jörg Lenhard" <notifications@github.com>:
… So @tomtung <https://github.com/tomtung> did some optimization and I did
a new performance benchmark, the results are as follows:
Benchmark Mode Cnt Score Error Units
Benchmarks.latexToUnicodeConversion thrpt 20 72417.919 ▒ 1906.792 ops/s
Benchmarks.parallelSearch thrpt 20 570.149 ▒ 52.298 ops/s
Benchmarks.parse thrpt 20 19.891 ▒ 2.068 ops/s
Benchmarks.search thrpt 20 287.523 ▒ 46.959 ops/s
This is a huge boost from the previous version, although it is not up to
the level that we have with our own conversion. Nevertheless, I would
strongly suggest to go for it. I did not notice a significantly longer
delay in opening JabRef, even when opening Aegits gigantic file.
What I see as critical is that the library gives us a much more complete
Latex to unicode conversion that fixes a number of bugs which we currently
have listed in our issue tracker. And it even has potential for much more
than that. For instance, we can now correctly convert italics (which cannot
easily be displayed, so there is not much we can do with it right now, but
anyway). The code can be viewed as part of #2532
<#2532>.
@JabRef/developers <https://github.com/orgs/JabRef/teams/developers> What
do you think?
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#2465 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AATi5KMwniZlFmHWkaMa7hPmI4nFJed2ks5ratbWgaJpZM4LjFKd>
.
|
#2532 is merged, so this can be closed. |
This would solve some serious issues:
Idea: Use https://github.com/tomtung/latex2unicode
Provides a simple extendable grammar.
We then would only need a conversion to html
Refs #2063 #207 #1252
The text was updated successfully, but these errors were encountered: