-
-
Notifications
You must be signed in to change notification settings - Fork 2.6k
GSoC 2022 Apache Lucene Search
Oliver Kopp edited this page Mar 26, 2022
·
1 revision
This page summarizes the main points of the GSoC 2022 project "Apache Lucene Search" (https://github.com/JabRef/www.jabref.org/blob/main/GSoC2022.md#apache-lucene-search)
- Description: JabRef offers an extensive search function that is based on a custom search syntax. The goal is to replace the custom search syntax and grammar with Apache Lucene's search syntax. It should offer the same functionality as the existing search.
- Skills required: Java, JavaFX (experience with Lucene is a plus)
- Expected outcome: A functioning search that supports the same functionality as the old search. More information can be found in this PR#8206
- Possible mentors: @koppor, @Siedlerchr, @calixtus
- Project size: 175h (medium)
Currently,
- the search is slow for a large database (>10k Entries)
- the search syntax is custom for JabRef and not a common one (Lucene is more common)
- there are many open issues in the search itself. All relevent for this project can be found using following query: https://github.com/JabRef/jabref/issues?q=is%3Aopen+label%3A%22project%3A+GSoC%22+label%3A%22search%22
The main goal of the project is to offer a fast and good search based n Lucene. Factors for that are: Support of non-ASCII characters: For instance, if a user searches for "Breitenbucher", also "Breitenbücher" should be matched. Be aware that Breitenbücher can be encoded as a) Breitenbücher (UTF8), b) Breitenb"ucher, and maybe d) Breitenbuecher in the library (see JabRef/jabref#6815 for details). The latter is not that easy, could be out of scope of GSoC.
Thus, the steps are:
- Make a concept to use the lucene search index as index for bib entries. -- This could involve to update the lucene index after a bib entry is added/changed. One has also to think of the cases a) where the index is not available at start and b) if the bib file was changed outside of JabRef (timestamp-based check?). - Sub step: Dive into the existing implementation start at https://github.com/JabRef/jabref/pull/8206. One can base on that code.
- Implement the concept, so that the search of JabRef is based on the search index
- Go through the issues listed at https://github.com/JabRef/jabref/issues?q=is%3Aopen+label%3A%22project%3A+GSoC%22+label%3A%22search%22 whether these issues are fixed. If an issue is not fixed, work on fixing it.
Just for information:
- The discussion started at https://github.com/JabRef/jabref/issues/1975.
- There are more general issues with search (See https://github.com/JabRef/jabref/labels/search), not all can be covered by GSoC.
- Home
- General Information
- Development
- Please go to our devdocs at https://devdocs.jabref.org
- "Google Summer of Code" project ideas
- Completed "Google Summer of Code" (GSoC) projects
- GSoC 2024 ‐ Improved CSL Support (and more LibreOffice‐JabRef integration enhancements)
- GSoC 2024 - Lucene Search Backend Integration
- GSoC 2024 ‐ AI‐Powered Summarization and “Interaction” with Academic Papers
- GSoC 2022 — Implement a Three Way Merge UI for merging BibTeX entries
- GSoC 2021 - Improve pdf support in JabRef
- GSoC 2021 - Microsoft Word Integration
- GSoc 2019 - Bidirectional Integration — Paper Writing — LaTeX and JabRef 5.0
- Release
- JabCon Archive