Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve CFF import/export and craft a round-trip test #10995

Merged
merged 44 commits into from
Mar 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
44 commits
Select commit Hold shift + click to select a range
f437d58
issue #10993 - feat: added ability to parse preferred-citation field …
jeanprbt Mar 8, 2024
b5298df
issue #10993 - feat: added all fields of JabRef/CITATION.cff to CffIm…
jeanprbt Mar 8, 2024
a6b62e1
issue #10993 - feat: rewrote CffExporter to parse Software, Dataset t…
jeanprbt Mar 11, 2024
cd94d33
issue #10993 - feat: added keywords and unknown fields support
jeanprbt Mar 11, 2024
ca0f887
issue #10993 - feat: added round-trip test
jeanprbt Mar 11, 2024
0b1b578
issue #10993 - doc: updated CHANGELOG.md
jeanprbt Mar 11, 2024
d32d26f
Merge branch 'JabRef:main' into issue/10993
jeanprbt Mar 15, 2024
56bf7e7
Convert RemoveBracesFormatterTest to @ParameterizedTest (#11033)
koppor Mar 15, 2024
c4b2328
Importing of BibDesk Groups and Linked Files (#10968)
Frequinzy Mar 17, 2024
57f8a63
Speed up failure reporting (#11030)
koppor Mar 17, 2024
7a4be6d
Fixes Zotero file handling for absolute paths (#11038)
Siedlerchr Mar 18, 2024
7abf13d
Change copy-paste function to handle string constants (follow up PR) …
Siedlerchr Mar 18, 2024
9587520
Bump gittools/actions from 0.13.4 to 1.1.1 (#11039)
dependabot[bot] Mar 18, 2024
1ec6a6e
Bump com.googlecode.plist:dd-plist from 1.23 to 1.28 (#11040)
dependabot[bot] Mar 18, 2024
930a9b4
Bump org.apache.pdfbox:xmpbox from 3.0.1 to 3.0.2 (#11041)
dependabot[bot] Mar 18, 2024
5858598
Bump com.dlsc.gemsfx:gemsfx from 2.2.0 to 2.4.0 (#11044)
dependabot[bot] Mar 18, 2024
7cb8885
Bump org.apache.pdfbox:fontbox from 3.0.1 to 3.0.2 (#11042)
dependabot[bot] Mar 18, 2024
342cb24
Keep enclosing braces of authors (#11034)
koppor Mar 18, 2024
5ab2a81
Improve citation relations (#11016)
ror3d Mar 18, 2024
7a269d4
issue #10993 - doc: updated CHANGELOG.md
jeanprbt Mar 11, 2024
8a8434a
Merge branch 'main' into issue/10993
jeanprbt Mar 18, 2024
008472b
fix: fixed unit tests not passing due to name changes in Author inter…
jeanprbt Mar 18, 2024
6f925ec
feat: changed CFFExporter to use YAML library snakeyaml instead (#10995)
jeanprbt Mar 18, 2024
5a60aff
feat: added support for references and ALL possible CFF fields in imp…
jeanprbt Mar 18, 2024
8fbdf26
Merge branch 'main' into issue/10993
jeanprbt Mar 18, 2024
5e697a2
fix: added requested changes (#10995)
jeanprbt Mar 19, 2024
88c42b8
fix: task rewriteDryRun fixed to pass by removing test in BibEntryTest
jeanprbt Mar 19, 2024
e1b1665
Merge branch 'main' into issue/10993
jeanprbt Mar 19, 2024
9271368
refactor: deleted useless methods in CffImporter (#10995)
jeanprbt Mar 19, 2024
69245be
doc: added decision MADR document for cff export (#10995)
jeanprbt Mar 19, 2024
ad2d600
feat: add a cites or related relationship between imported entries in…
jeanprbt Mar 20, 2024
6978078
Merge branch 'main' into issue/10993
jeanprbt Mar 20, 2024
ca9c0dc
doc: updated MADR decision document for cff export to pass markdownli…
jeanprbt Mar 20, 2024
359237d
fix: fixed round-trip test to use mock citatioKeyPatternPreferences c…
jeanprbt Mar 20, 2024
a8518b7
fix: fixed MADR document for CFF export decision to pass Jekyll CI ch…
jeanprbt Mar 20, 2024
0264c03
fix: fixed requested changes (#10995)
jeanprbt Mar 20, 2024
c4bc13c
feat: finished CFFExporter logic and crafted working round-trip test …
jeanprbt Mar 21, 2024
de27eef
Merge branch 'main' into issue/10993
jeanprbt Mar 21, 2024
2450c80
fix: fixed typos in MADR decision doc for CFF export and refactore Im…
jeanprbt Mar 21, 2024
8d72c5f
Some code beautification
koppor Mar 21, 2024
bf9ff8b
Use existing method getEntryLinkList
koppor Mar 21, 2024
447632b
Use getEntryLinkList
koppor Mar 21, 2024
c43d14a
Use JabRef's Date class for parsing
koppor Mar 21, 2024
60904da
Fix indentation in new line
calixtus Mar 21, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
- We store the citation relations in an LRU cache to avoid bloating the memory and out-of-memory exceptions. [#10958](https://github.com/JabRef/jabref/issues/10958)
- Keywords filed are now displayed as tags. [#10910](https://github.com/JabRef/jabref/pull/10910)
- Citation relations now get more information, and have quick access to view the articles in a browser without adding them to the library [#10869](https://github.com/JabRef/jabref/issues/10869)
- Importer/Exporter for CFF format now supports JabRef `cites` and `related` relationships, as well as all fields from the CFF specification. [#10993](https://github.com/JabRef/jabref/issues/10993)

### Fixed

Expand Down
3 changes: 3 additions & 0 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -248,6 +248,9 @@ dependencies {
// parse plist files
implementation 'com.googlecode.plist:dd-plist:1.28'

// YAML formatting
implementation 'org.yaml:snakeyaml:2.2'

testImplementation 'io.github.classgraph:classgraph:4.8.168'
testImplementation 'org.junit.jupiter:junit-jupiter:5.10.2'
testImplementation 'org.junit.platform:junit-platform-launcher:1.10.2'
Expand Down
55 changes: 55 additions & 0 deletions docs/decisions/0029-cff-export-multiple-entries.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
---
nav_order: 28
parent: Decision Records
---

<!-- we need to disable MD025, because we use the different heading "ADR Template" in the homepage (see above) than it is foreseen in the template -->
<!-- markdownlint-disable-next-line MD025 -->
# Exporting multiple entries to CFF

## Context and Problem Statement

The need for an [exporter](https://github.com/JabRef/jabref/issues/10661) to [CFF format](https://github.com/citation-file-format/citation-file-format/blob/main/schema-guide.md) raised the following issue: How to export multiple entries at once? Citation-File-Format is intended to make software and datasets citable. It should contain one "main" entry of type `software` or `dataset`, a possible preferred citation and/or several references of any type.

## Decision Drivers

* Make exported files compatible with official CFF tools
* Make exporting process logical for users

## Considered Options

* When exporting:
* Export non-`software` entries with dummy topmost `sofware` and entries as `preferred-citation`
* Export non-`software` entries with dummy topmost `sofware` and entries as `references`
* Forbid exporting multiple entries at once
* Forbid exporting more than one software entry at once
* Export entries in several files (i.e. one / file)
* Export several `software` entries with one of them topmost and all others as `references`
* Export several `software` entries with a dummy topmost `software` element and all others as `references`
* When importing:
* Only create one entry / file, enven if there is a `preferred-citation` or `references`
* Add a JabRef `cites` relation from `software` entry to its `preferred-citation`
* Add a JabRef `cites` relation from `preferred-citation` entry to the main `software` entry
* Separate `software` entries from their `preferred-citation` or `references`

## Decision Outcome

The decision outcome is the following.

* When exporting, JabRef will have a different behavior depending on entries type.
* If multiple non-`software` entries are selected, then exporter uses the `references` field with a dummy topmost `software` element.
* If several entries including a `software` or `dataset` one are selected, then exporter uses this one as topmost element and the others as `references`, adding a potential `preferred-citation` for the potential `cites` element of the topmost `software` entry.
* If several entries including several `software` ones are selected, then exporter uses a dummy topmost element, and selected entries are exported as `references`. The `cites` or `related` fields won't be exported in this case.
* JabRef will not handle `cites` or `related` fields for non-`software` elements.
* When importing, JabRef will create several entries: one main entry for the `software` and other entries for the potential `preferred-citation` and `references` fields. JabRef will link main entry to the preferred citation using a `cites` from the main entry, and wil link main entry to the references using a `related` from the main entry.

### Positive Consequences

* Exported results comply with CFF format
* The export process is "logic" : an user who exports multiple files to CFF might find it clear that they are all marked as `references`
* Importing a CFF file and then exporting the "main" (software) created entry is consistent and will produce the same result

### Negative Consequences

* Importing a CFF file and then exporting one of the `preferred-citation` or the `references` created entries won't result in the same file (i.e exported file will contain a dummy topmost `software` instead of the actual `software` that was imported)
* `cites` and `related` fields of non-`software` entries are not supported
1 change: 1 addition & 0 deletions src/main/java/module-info.java
Original file line number Diff line number Diff line change
Expand Up @@ -145,4 +145,5 @@
requires de.saxsys.mvvmfx.validation;
requires com.jthemedetector;
requires dd.plist;
requires org.yaml.snakeyaml;
}
4 changes: 3 additions & 1 deletion src/main/java/org/jabref/cli/ArgumentProcessor.java
Original file line number Diff line number Diff line change
Expand Up @@ -166,7 +166,9 @@ private Optional<ParserResult> importFile(Path file, String importFormat) {
ImportFormatReader importFormatReader = new ImportFormatReader(
preferencesService.getImporterPreferences(),
preferencesService.getImportFormatPreferences(),
fileUpdateMonitor);
preferencesService.getCitationKeyPatternPreferences(),
fileUpdateMonitor
);

if (!"*".equals(importFormat)) {
System.out.println(Localization.lang("Importing %0", file));
Expand Down
4 changes: 3 additions & 1 deletion src/main/java/org/jabref/cli/JabRefCLI.java
Original file line number Diff line number Diff line change
Expand Up @@ -317,7 +317,9 @@ public static void printUsage(PreferencesService preferencesService) {
ImportFormatReader importFormatReader = new ImportFormatReader(
preferencesService.getImporterPreferences(),
preferencesService.getImportFormatPreferences(),
new DummyFileUpdateMonitor());
preferencesService.getCitationKeyPatternPreferences(),
new DummyFileUpdateMonitor()
);
List<Pair<String, String>> importFormats = importFormatReader
.getImportFormats().stream()
.map(format -> new Pair<>(format.getName(), format.getId()))
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,9 @@ private List<BibEntry> tryImportFormats(String data) {
ImportFormatReader importFormatReader = new ImportFormatReader(
preferencesService.getImporterPreferences(),
preferencesService.getImportFormatPreferences(),
fileUpdateMonitor);
preferencesService.getCitationKeyPatternPreferences(),
fileUpdateMonitor
);
UnknownFormatImport unknownFormatImport = importFormatReader.importUnknownFormat(data);
return unknownFormatImport.parserResult().getDatabase().getEntries();
} catch (ImportException ex) { // ex is already localized
Expand Down
8 changes: 6 additions & 2 deletions src/main/java/org/jabref/gui/importer/ImportCommand.java
Original file line number Diff line number Diff line change
Expand Up @@ -79,7 +79,9 @@ public void execute() {
ImportFormatReader importFormatReader = new ImportFormatReader(
preferencesService.getImporterPreferences(),
preferencesService.getImportFormatPreferences(),
fileUpdateMonitor);
preferencesService.getCitationKeyPatternPreferences(),
fileUpdateMonitor
);
SortedSet<Importer> importers = importFormatReader.getImportFormats();

FileDialogConfiguration fileDialogConfiguration = new FileDialogConfiguration.Builder()
Expand Down Expand Up @@ -134,7 +136,9 @@ private ParserResult doImport(List<Path> files, Importer importFormat) throws IO
ImportFormatReader importFormatReader = new ImportFormatReader(
preferencesService.getImporterPreferences(),
preferencesService.getImportFormatPreferences(),
fileUpdateMonitor);
preferencesService.getCitationKeyPatternPreferences(),
fileUpdateMonitor
);
for (Path filename : files) {
try {
if (importer.isEmpty()) {
Expand Down
263 changes: 263 additions & 0 deletions src/main/java/org/jabref/logic/exporter/CffExporter.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,263 @@
package org.jabref.logic.exporter;

import java.io.FileWriter;
import java.io.IOException;
import java.nio.charset.StandardCharsets;
import java.nio.file.Path;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;
import java.util.Objects;
import java.util.Optional;

import org.jabref.logic.util.StandardFileType;
import org.jabref.model.database.BibDatabaseContext;
import org.jabref.model.entry.Author;
import org.jabref.model.entry.AuthorList;
import org.jabref.model.entry.BibEntry;
import org.jabref.model.entry.Date;
import org.jabref.model.entry.field.BiblatexSoftwareField;
import org.jabref.model.entry.field.Field;
import org.jabref.model.entry.field.StandardField;
import org.jabref.model.entry.field.UnknownField;
import org.jabref.model.entry.types.EntryType;
import org.jabref.model.entry.types.StandardEntryType;

import org.yaml.snakeyaml.DumperOptions;
import org.yaml.snakeyaml.Yaml;

public class CffExporter extends Exporter {
// Fields that are taken 1:1 from BibTeX to CFF
public static final List<String> UNMAPPED_FIELDS = List.of(
"abbreviation", "collection-doi", "collection-title", "collection-type", "commit", "copyright",
"data-type", "database", "date-accessed", "date-downloaded", "date-published", "department", "end",
"entry", "filename", "format", "issue-date", "issue-title", "license-url", "loc-end", "loc-start",
"medium", "nihmsid", "number-volumes", "patent-states", "pmcid", "repository-artifact", "repository-code",
"scope", "section", "start", "term", "thesis-type", "volume-title", "year-original"
);

public static final Map<Field, String> FIELDS_MAP = Map.ofEntries(
Map.entry(StandardField.ABSTRACT, "abstract"),
Map.entry(StandardField.DATE, "date-released"),
Map.entry(StandardField.DOI, "doi"),
Map.entry(StandardField.KEYWORDS, "keywords"),
Map.entry(BiblatexSoftwareField.LICENSE, "license"),
Map.entry(StandardField.COMMENT, "message"),
Map.entry(BiblatexSoftwareField.REPOSITORY, "repository"),
Map.entry(StandardField.TITLE, "title"),
Map.entry(StandardField.URL, "url"),
Map.entry(StandardField.VERSION, "version"),
Map.entry(StandardField.EDITION, "edition"),
Map.entry(StandardField.ISBN, "isbn"),
Map.entry(StandardField.ISSN, "issn"),
Map.entry(StandardField.ISSUE, "issue"),
Map.entry(StandardField.JOURNAL, "journal"),
Map.entry(StandardField.MONTH, "month"),
Map.entry(StandardField.NOTE, "notes"),
Map.entry(StandardField.NUMBER, "number"),
Map.entry(StandardField.PAGES, "pages"),
Map.entry(StandardField.PUBSTATE, "status"),
Map.entry(StandardField.VOLUME, "volume"),
Map.entry(StandardField.YEAR, "year")
);

public static final Map<EntryType, String> TYPES_MAP = Map.ofEntries(
Map.entry(StandardEntryType.Article, "article"),
Map.entry(StandardEntryType.Book, "book"),
Map.entry(StandardEntryType.Booklet, "pamphlet"),
Map.entry(StandardEntryType.Proceedings, "conference"),
Map.entry(StandardEntryType.InProceedings, "conference-paper"),
Map.entry(StandardEntryType.Misc, "misc"),
Map.entry(StandardEntryType.Manual, "manual"),
Map.entry(StandardEntryType.Software, "software"),
Map.entry(StandardEntryType.Dataset, "dataset"),
Map.entry(StandardEntryType.Report, "report"),
Map.entry(StandardEntryType.Unpublished, "unpublished")
);

public CffExporter() {
super("cff", "CFF", StandardFileType.CFF);
}

@Override
public void export(BibDatabaseContext databaseContext, Path file, List<BibEntry> entries) throws Exception {
koppor marked this conversation as resolved.
Show resolved Hide resolved
Objects.requireNonNull(databaseContext);
Objects.requireNonNull(file);
Objects.requireNonNull(entries);

// Do not export if no entries to export -- avoids exports with only template text
if (entries.isEmpty()) {
return;
}

// Make a copy of the list to avoid modifying the original list
final List<BibEntry> entriesToTransform = new ArrayList<>(entries);

// Set up YAML options
DumperOptions options = new DumperOptions();
options.setWidth(Integer.MAX_VALUE);
options.setDefaultFlowStyle(DumperOptions.FlowStyle.BLOCK);
options.setPrettyFlow(true);
options.setIndentWithIndicator(true);
options.setIndicatorIndent(2);
Yaml yaml = new Yaml(options);

BibEntry main = null;
boolean mainIsDummy = false;
int countOfSoftwareAndDataSetEntries = 0;
for (BibEntry entry : entriesToTransform) {
if (entry.getType() == StandardEntryType.Software || entry.getType() == StandardEntryType.Dataset) {
main = entry;
countOfSoftwareAndDataSetEntries++;
}
}
if (countOfSoftwareAndDataSetEntries == 1) {
// If there is only one software or dataset entry, use it as the main entry
entriesToTransform.remove(main);
} else {
// If there are no software or dataset entries, create a dummy main entry holding the given entries
main = new BibEntry(StandardEntryType.Software);
mainIsDummy = true;
}

// Transform main entry to CFF format
Map<String, Object> cffData = transformEntry(main, true, mainIsDummy);

// Preferred citation
if (main.hasField(StandardField.CITES)) {
String citeKey = main.getField(StandardField.CITES).orElse("").split(",")[0];
List<BibEntry> citedEntries = databaseContext.getDatabase().getEntriesByCitationKey(citeKey);
entriesToTransform.removeAll(citedEntries);
if (!citedEntries.isEmpty()) {
BibEntry citedEntry = citedEntries.getFirst();
cffData.put("preferred-citation", transformEntry(citedEntry, false, false));
}
}

// References
List<Map<String, Object>> related = new ArrayList<>();
if (main.hasField(StandardField.RELATED)) {
main.getEntryLinkList(StandardField.RELATED, databaseContext.getDatabase())
.stream()
.map(link -> link.getLinkedEntry())
.filter(Optional::isPresent)
.map(Optional::get)
.forEach(entry -> {
related.add(transformEntry(entry, false, false));
entriesToTransform.remove(entry);
});
}

// Add remaining entries as references
for (BibEntry entry : entriesToTransform) {
related.add(transformEntry(entry, false, false));
}
if (!related.isEmpty()) {
cffData.put("references", related);
}

try (FileWriter writer = new FileWriter(file.toFile(), StandardCharsets.UTF_8)) {
yaml.dump(cffData, writer);
} catch (IOException ex) {
throw new SaveException(ex);
}
}

private Map<String, Object> transformEntry(BibEntry entry, boolean main, boolean dummy) {
Map<String, Object> cffData = new LinkedHashMap<>();
Map<Field, String> fields = new HashMap<>(entry.getFieldMap());

if (main) {
// Mandatory CFF version field
cffData.put("cff-version", "1.2.0");

// Mandatory message field
String message = fields.getOrDefault(StandardField.COMMENT,
"If you use this software, please cite it using the metadata from this file.");
cffData.put("message", message);
fields.remove(StandardField.COMMENT);
}

// Mandatory title field
String title = fields.getOrDefault(StandardField.TITLE, "No title specified.");
cffData.put("title", title);
fields.remove(StandardField.TITLE);

// Mandatory authors field
List<Author> authors = AuthorList.parse(fields.getOrDefault(StandardField.AUTHOR, ""))
.getAuthors();
parseAuthors(cffData, authors);
fields.remove(StandardField.AUTHOR);

// Type
if (!dummy) {
cffData.put("type", TYPES_MAP.getOrDefault(entry.getType(), "misc"));
}

// Keywords
String keywords = fields.getOrDefault(StandardField.KEYWORDS, null);
if (keywords != null) {
cffData.put("keywords", keywords.split(",\\s*"));
}
fields.remove(StandardField.KEYWORDS);

// Date
String date = fields.getOrDefault(StandardField.DATE, null);
if (date != null) {
parseDate(cffData, date);
}
fields.remove(StandardField.DATE);

// Remaining fields not handled above
for (Field field : fields.keySet()) {
if (FIELDS_MAP.containsKey(field)) {
cffData.put(FIELDS_MAP.get(field), fields.get(field));
} else if (field instanceof UnknownField) {
// Check that field is accepted by CFF format specification
if (UNMAPPED_FIELDS.contains(field.getName())) {
cffData.put(field.getName(), fields.get(field));
}
}
}
return cffData;
}

private void parseAuthors(Map<String, Object> data, List<Author> authors) {
List<Map<String, String>> authorsList = new ArrayList<>();
authors.forEach(author -> {
Map<String, String> authorMap = new LinkedHashMap<>();
if (author.getFamilyName().isPresent()) {
authorMap.put("family-names", author.getFamilyName().get());
}
if (author.getGivenName().isPresent()) {
authorMap.put("given-names", author.getGivenName().get());
}
if (author.getNamePrefix().isPresent()) {
authorMap.put("name-particle", author.getNamePrefix().get());
}
if (author.getNameSuffix().isPresent()) {
authorMap.put("name-suffix", author.getNameSuffix().get());
}
authorsList.add(authorMap);
});
data.put("authors", authorsList.isEmpty() ? List.of(Map.of("name", "/")) : authorsList);
}

private void parseDate(Map<String, Object> data, String date) {
Optional<Date> parsedDateOpt = Date.parse(date);
if (parsedDateOpt.isEmpty()) {
data.put("issue-date", date);
return;
}
Date parsedDate = parsedDateOpt.get();
if (parsedDate.getYear().isPresent() && parsedDate.getMonth().isPresent() && parsedDate.getDay().isPresent()) {
data.put("date-released", parsedDate.getNormalized());
return;
}
parsedDate.getMonth().ifPresent(month -> data.put("month", month.getNumber()));
parsedDate.getYear().ifPresent(year -> data.put("year", year));
}
}

Loading
Loading