Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix #10498 Create Fetcher and Transformer for ScholarArchive #10549

Merged
merged 13 commits into from
Nov 6, 2023
Original file line number Diff line number Diff line change
@@ -0,0 +1,167 @@
package org.jabref.logic.importer.fetcher;

import kong.unirest.json.JSONArray;
import kong.unirest.json.JSONException;
import kong.unirest.json.JSONObject;
import org.apache.http.client.utils.URIBuilder;
import org.apache.lucene.queryparser.flexible.core.nodes.QueryNode;
import org.jabref.logic.importer.FetcherException;

Check failure on line 8 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Wrong order for 'org.jabref.logic.importer.FetcherException' import. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:8:1: error: Wrong order for 'org.jabref.logic.importer.FetcherException' import. (com.puppycrawl.tools.checkstyle.checks.imports.ImportOrderCheck)
import org.jabref.logic.importer.PagedSearchBasedParserFetcher;
import org.jabref.logic.importer.ParseException;
import org.jabref.logic.importer.Parser;
import org.jabref.logic.importer.fetcher.transformers.ScholarArchiveQueryTransformer;
import org.jabref.logic.util.OS;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.types.EntryType;
import org.jabref.model.entry.types.StandardEntryType;
import org.slf4j.Logger;

Check failure on line 18 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 'org.slf4j.Logger' should be separated from previous imports. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:18:1: error: 'org.slf4j.Logger' should be separated from previous imports. (com.puppycrawl.tools.checkstyle.checks.imports.ImportOrderCheck)
import org.slf4j.LoggerFactory;

import java.io.BufferedReader;

Check failure on line 21 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Wrong order for 'java.io.BufferedReader' import. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:21:1: error: Wrong order for 'java.io.BufferedReader' import. (com.puppycrawl.tools.checkstyle.checks.imports.ImportOrderCheck)
import java.io.InputStreamReader;
import java.net.MalformedURLException;
import java.net.URISyntaxException;
import java.net.URL;
import java.time.LocalDate;

Check failure on line 26 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Unused import - java.time.LocalDate. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:26:8: error: Unused import - java.time.LocalDate. (com.puppycrawl.tools.checkstyle.checks.imports.UnusedImportsCheck)
import java.time.format.DateTimeFormatter;

Check failure on line 27 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Unused import - java.time.format.DateTimeFormatter. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:27:8: error: Unused import - java.time.format.DateTimeFormatter. (com.puppycrawl.tools.checkstyle.checks.imports.UnusedImportsCheck)
import java.util.ArrayList;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

Check failure on line 31 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Unused import - java.util.stream.IntStream. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:31:8: error: Unused import - java.util.stream.IntStream. (com.puppycrawl.tools.checkstyle.checks.imports.UnusedImportsCheck)

public class ScholarArchiveFetcher implements PagedSearchBasedParserFetcher {

// Define a constant for the fetcher name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove that comment. Next line states some.

public static final String FETCHER_NAME = "ScholarArchive";

// Initialize the logger for this class.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove that comment. Next line states some.

private static final Logger LOGGER = LoggerFactory.getLogger(ScholarArchiveFetcher.class);

// Define the API URL for ScholarArchive.
private static final String API_URL = "https://scholar.archive.org/search";

@Override
/**

Check failure on line 45 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Javadoc comment is placed in the wrong location. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:45:5: error: Javadoc comment is placed in the wrong location. (com.puppycrawl.tools.checkstyle.checks.javadoc.InvalidJavadocPositionCheck)
* Gets the query URL.
*
* @param luceneQuery the search query
* @param pageNumber the number of the page indexed from 0
* @return URL
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
*/
public URL getURLForQuery(QueryNode luceneQuery, int pageNumber) throws URISyntaxException, MalformedURLException, FetcherException {
URIBuilder uriBuilder = new URIBuilder(API_URL);

// Add search query parameter to the URL.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
uriBuilder.addParameter("q", new ScholarArchiveQueryTransformer().transformLuceneQuery(luceneQuery).orElse(""));

// Add page number and page size parameters to the URL.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
uriBuilder.addParameter("from", String.valueOf(getPageSize() * pageNumber));
uriBuilder.addParameter("size", String.valueOf(getPageSize()));

// Specify the response format as JSON.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
uriBuilder.addParameter("format", "json");

// Build the URL.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
return uriBuilder.build().toURL();
}

@Override
public Parser getParser() {
return inputStream -> {
// Read the API response into a string.
String response = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(OS.NEWLINE));
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved

// Parse the JSON response into a list of BibEntry objects.
JSONObject jsonObject = new JSONObject(response);
List<BibEntry> entries = new ArrayList<>();
if (jsonObject.has("results")) {
JSONArray results = jsonObject.getJSONArray("results");
for (int i = 0; i < results.length(); i++) {
JSONObject jsonEntry = results.getJSONObject(i);
BibEntry entry = parseJSONtoBibtex(jsonEntry);
entries.add(entry);
}
}

return entries;
};
}


@Override

Check failure on line 92 in src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 'METHOD_DEF' has more than 1 empty lines before. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/ScholarArchiveFetcher.java:92:5: error: 'METHOD_DEF' has more than 1 empty lines before. (com.puppycrawl.tools.checkstyle.checks.whitespace.EmptyLineSeparatorCheck)
public String getName() {
return FETCHER_NAME;
}

private BibEntry parseJSONtoBibtex(JSONObject jsonEntry) throws ParseException {
try{
BibEntry entry = new BibEntry();
EntryType entryType = StandardEntryType.InCollection;
JSONObject biblio = jsonEntry.optJSONObject("biblio");
JSONObject abstracts = jsonEntry.optJSONObject("abstracts");

// publication type
String type = biblio.optString("release_type");
entry.setField(StandardField.TYPE, type);
if (type.toLowerCase().contains("book")) {
entryType = StandardEntryType.Book;
} else if (type.toLowerCase().contains("article")) {
entryType = StandardEntryType.Article;
}
entry.setType(entryType);


entry.setField(StandardField.TITLE, biblio.optString("title"));
entry.setField(StandardField.JOURNAL,biblio.optString("container_name"));
entry.setField(StandardField.DOI,biblio.optString("doi"));
entry.setField(StandardField.ISSUE,biblio.optString("issue"));
entry.setField(StandardField.LANGUAGE,biblio.optString("lang_code"));
entry.setField(StandardField.PUBLISHER,biblio.optString("publisher"));

entry.setField(StandardField.YEAR, String.valueOf(biblio.optInt("release_year")));
entry.setField(StandardField.VOLUME,String.valueOf(biblio.optInt("volume_int")));
entry.setField(StandardField.ABSTRACT,abstracts.optString("body"));

// Date
String dateString = (String) biblio.get("date");

entry.setField(StandardField.DATE,dateString);

// Authors
if (biblio.has("contrib_names")) {
JSONArray authors = jsonEntry.getJSONArray("contrib_names");
List<String> authorList = new ArrayList<>();
if(authors!=null && authors.length()>0){
for (int i = 0; i < authors.length(); i++) {
authorList.add(authors.getString(i));
}
}
entry.setField(StandardField.AUTHOR, String.join(" and ", authorList));
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
} else {
LOGGER.info("No author found.");
}

// ISSN
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
if (biblio.has("issns")) {
JSONArray Issns = jsonEntry.getJSONArray("issns");
List<String> IssnsList = new ArrayList<>();
if(Issns!=null && Issns.length()>0){
for (int i = 0; i < Issns.length(); i++) {
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
IssnsList.add(Issns.getString(i));
}
}
entry.setField(StandardField.ISSN, String.join(" and ", IssnsList));
liyou969 marked this conversation as resolved.
Show resolved Hide resolved
} else {
LOGGER.info("No ISSN found.");
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
}

return entry;
}
catch (JSONException exception) {
throw new ParseException("ScholarArchive API JSON format has changed", exception);
}

}

}
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
package org.jabref.logic.importer.fetcher.transformers;

import java.util.Optional;

Check failure on line 3 in src/main/java/org/jabref/logic/importer/fetcher/transformers/ScholarArchiveQueryTransformer.java

View workflow job for this annotation

GitHub Actions / Checkstyle

[reviewdog] reported by reviewdog 🐶 Unused import - java.util.Optional. Raw Output: /github/workspace/./src/main/java/org/jabref/logic/importer/fetcher/transformers/ScholarArchiveQueryTransformer.java:3:8: error: Unused import - java.util.Optional. (com.puppycrawl.tools.checkstyle.checks.imports.UnusedImportsCheck)

// This class extends the AbstractQueryTransformer to provide specific implementations
// for transforming standard queries into ones suitable for the Scholar Archive's unique format.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
public class ScholarArchiveQueryTransformer extends AbstractQueryTransformer {

// Returns the operator for logical "AND" used in the Scholar Archive query language.
@Override
protected String getLogicalAndOperator() {
return " AND ";
}

// Returns the operator for logical "OR" used in the Scholar Archive query language.
@Override
protected String getLogicalOrOperator() {
return " OR ";
}

// Returns the operator for logical "NOT" used in the Scholar Archive query language.
@Override
protected String getLogicalNotOperator() {
return "NOT ";
}

// Transforms the author query segment into a 'contrib_names' key-value pair for the Scholar Archive query.
// @param author - the author's name to be searched in the Scholar Archive.
@Override
protected String handleAuthor(String author) {
return createKeyValuePair("contrib_names", author);
}

// Transforms the title query segment into a 'title' key-value pair for the Scholar Archive query.
// @param title - the title of the work to be searched in the Scholar Archive.
@Override
protected String handleTitle(String title) {
return createKeyValuePair("title", title);
}

// Transforms the journal title query segment into a 'container_name' key-value pair for the Scholar Archive query.
// @param journalTitle - the name of the journal to be searched in the Scholar Archive.
@Override
protected String handleJournal(String journalTitle) {
return createKeyValuePair("container_name", journalTitle);
}

// Handles the year query by formatting it specifically for a range search in the Scholar Archive.
// This is for an exact year match.
// @param year - the publication year to be searched in the Scholar Archive.
@Override
protected String handleYear(String year) {
return "publication.startDate:[" + year + " TO " + year + "]";
}

// Handles a year range query, transforming it for the Scholar Archive's query format.
// If only a start year is provided, the range will extend to the current year.
// @param yearRange - the range of years to be searched in the Scholar Archive, usually in the format "startYear-endYear".
@Override
protected String handleYearRange(String yearRange) {
parseYearRange(yearRange); // This method presumably parses the year range into individual components.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
if (endYear == Integer.MAX_VALUE) {
return yearRange; // If no specific end year is set, it assumes the range extends to the current year.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
}
// Formats the year range for inclusion in the Scholar Archive query.
Siedlerchr marked this conversation as resolved.
Show resolved Hide resolved
return "publication.startDate:[" + startYear + " TO " + endYear + "]";
}
}



Loading