Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/implement complex queries #7350

Merged
merged 33 commits into from
Jan 26, 2021
Merged

Conversation

DominikVoigt
Copy link
Contributor

@DominikVoigt DominikVoigt commented Jan 15, 2021

This PR introduces QueryTransformers for creating structured and fielded queries in the cross-library query language and transforming them into the library-specific query language.

Here a set of Transformers are included, and new transformers can easily be added by extending the AbstractQueryTransformer.

Transformers are integrated into the default performSearch method of the SearchBasedFetcher and all child classes.
Additionally, this PR addresses some bugs in the fetcher tests.

  • Change in CHANGELOG.md described (if applicable)
  • Tests created for changes (if applicable)
  • Manually tested changed features in running JabRef (always required)
  • Screenshots added in PR description (for UI changes)
  • Checked documentation: Is the information available and up to date? If not created an issue at https://github.com/JabRef/user-documentation/issues or, even better, submitted a pull request to the documentation repository.

DominikVoigt and others added 26 commits November 24, 2020 21:35
…guage.

Signed-off-by: Dominik Voigt <dominik.ingo.voigt@gmail.com>
…/jabref into feature/implement-complex-queries
Fix IEEE tests.
Extend IEEE capabilities.
Create workarounds for some IEEE bugs
Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, you are now creating huge PRs am laufenden Fließband ;-)

After skimming, I've a few questions about the interface. It feels a bit more complicated than it should ideally be (but than again, that's always the case with these fetcher). Why does the following simplified call chain doesn't work:

  • search(String query) parsing the user's query and calling search(parsedQuery) (This method is mainly there to make testwriting simpler; the parsing should happen also in the GUI to give direct feeback when the query syntax is wrong)
  • search(LuceneQuery query) calling getUrl to obtain the url, then fetch result, parse it, cleanup, return result.
  • getUrl(LuceneQuery query) generating the url to call from the query. This could use a transformer to convert the query into a string query which is then added to the url via addParameter("query", transformedQuery);. For REST-based API one should prefer to use something like addParameter("author", query.getAuthor()); instead.

*/
URL getURLForQuery(String query) throws URISyntaxException, MalformedURLException, FetcherException;
URL getURLForQuery(String transformedQuery, AbstractQueryTransformer transformer) throws URISyntaxException, MalformedURLException, FetcherException;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please explain why the getURL* methods need to have the transformer as a second argument? It seems that each fetcher has it's own transformer, so why not use a simple class variable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main reason for this is that some transformers require state that is used during query construction (See IEEE) and/or postprocessing (See ArXiv).
I now use a class variable and a reset method. :)


@Override
public AbstractQueryTransformer getQueryTransformer() {
return new ZbMathQueryTransformer();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put these transformer classes as private inner classes. I think it's an implementation detail how the fetcher transforms the user-query to a query used in the url.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I'm not sure how this would work with the inheritance of the transformer from the abstract implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That shouldn't be a problem. You just move the current code into the fetcher class.

@DominikVoigt
Copy link
Contributor Author

Wow, you are now creating huge PRs am laufenden Fließband ;-)

After skimming, I've a few questions about the interface. It feels a bit more complicated than it should ideally be (but than again, that's always the case with these fetcher). Why does the following simplified call chain doesn't work:

  • search(String query) parsing the user's query and calling search(parsedQuery) (This method is mainly there to make testwriting simpler; the parsing should happen also in the GUI to give direct feeback when the query syntax is wrong)
  • search(LuceneQuery query) calling getUrl to obtain the url, then fetch result, parse it, cleanup, return result.
  • getUrl(LuceneQuery query) generating the url to call from the query. This could use a transformer to convert the query into a string query which is then added to the url via addParameter("query", transformedQuery);. For REST-based API one should prefer to use something like addParameter("author", query.getAuthor()); instead.

The reason for this is that I wanted to keep the current fetcher method structure.
I don't quite see the advantage of your call chain.
Maybe I don't quite get how your proposed call chain works, could you iterate your concept a bit more in-depth? :)

@tobiasdiez
Copy link
Member

tobiasdiez commented Jan 15, 2021

Take for example the following (new) code:

/**
* This method is used to send complex queries using fielded search.
*
* @param transformedQuery the search query defining all fielded search parameters
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
List<BibEntry> performSearchForTransformedQuery(String transformedQuery) throws FetcherException;
/**
* Looks for hits which are matched by the given free-text query.
*
* @param searchQuery query string that can be parsed into a complex search query
* @return a list of {@link BibEntry}, which are matched by the query (may be empty)
*/
default List<BibEntry> performSearch(String searchQuery) throws JabRefException {
resetTransformer();
AbstractQueryTransformer transformer = getQueryTransformer();
if (searchQuery.isBlank()) {
return Collections.emptyList();
}
Optional<String> transformedQuery;
try {
transformedQuery = transformer.parseQueryStringIntoComplexQuery(searchQuery);
} catch (JabRefException e) {
throw new FetcherException("Error occured during query transformation", e);
}
// Otherwise just use query as a default term
return this.performSearchForTransformedQuery(transformedQuery.orElse(""));
}

This has the assumption that each fetcher has an API that supports searching via an extended query url parameter, e.g. query="au: Olly". However, there are cases where the API supports passing these extra search fields as extra url parameters, e.g. author=Olly. Moreover, I'm not a big fan of using string-based queries when we have more advanced objects (LuceneQuery) that can be used.

So my proposal would be the following rewrite of the above code

    /**
     * This method is used to send complex queries using fielded search.
     *
     * @param transformedQuery the search query defining all fielded search parameters
     * @return a list of {@link BibEntry}, which are matched by the query (may be empty)
     */
    List<BibEntry> performSearch(FetcherQuery query) throws FetcherException;
    // In a Parser-Based fetcher this calls getUrlForQuery(query)

    /**
     * Looks for hits which are matched by the given free-text query.
     *
     * @param searchQuery query string that can be parsed into a complex search query
     * @return a list of {@link BibEntry}, which are matched by the query (may be empty)
     */
    default List<BibEntry> performSearch(String query) throws JabRefException {
        return this.performSearch(FetcherQuery.parse(query));
    }

//// Then in a fetcher

   Url getUrlForQuery(FetcherQuery query) {
        urlBuilder.addParamater("query", new xyzTransformer().transform(query))

       // or
      urlBuilder.addParamater("author", query.getAuthor())

   } 

For flexibility, I would introduce a FetcherQuery class here, deriving from LuceneQuery that could provide a few convienentt helper methods.

@DominikVoigt
Copy link
Contributor Author

Okay, I adapted the code accordingly!
The only thing I did not do was to add a FetcherQuery, as it does not seem necessary?
What do you think of my adaption? :)

Copy link
Member

@tobiasdiez tobiasdiez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I like it a lot!

The idea behind the FetcherQuery was to provide access to common query parameters that are context-specific (like start/end year, title, authors).

@DominikVoigt DominikVoigt force-pushed the feature/implement-complex-queries branch from 1330546 to d40f640 Compare January 26, 2021 22:01
@koppor koppor merged commit 034cf8c into master Jan 26, 2021
@koppor koppor deleted the feature/implement-complex-queries branch January 26, 2021 22:43
Siedlerchr added a commit that referenced this pull request Feb 1, 2021
* upstream/master:
  Bump archunit-junit5-api from 0.15.0 to 0.16.0 (#7407)
  Bump classgraph from 4.8.98 to 4.8.102 (#7401)
  Bump archunit-junit5-engine from 0.15.0 to 0.16.0 (#7402)
  Bump mariadb-java-client from 2.7.1 to 2.7.2 (#7406)
  Bump org.beryx.jlink from 2.23.2 to 2.23.3 (#7400)
  Bump checkstyle from 8.39 to 8.40 (#7404)
  Ignore codecov status for automerge
  Fixes issue of Changing font size makes font size field too small (#7398)
  fix "Alt + keyboard shortcuts do not work" (#7379)
  Fixed invisible file path in the dark theme (#7396)
  Fix File Filter and some layout issues (#7385)
  Feature/implement complex queries (#7350)
  Change format for study definition to yaml (#7126)
  Fix handling of URL in file field (#7347)
  Fix expansion of bracketed expressions in RegExpBasedFileFinder (#7338)
@koppor koppor mentioned this pull request Jun 8, 2021
5 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants