-
Notifications
You must be signed in to change notification settings - Fork 495
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #9175 from GlobalDataverseCommunityConsortium/DANS…
…-external_exporters DANS - Exporters in external jars
- Loading branch information
Showing
53 changed files
with
1,810 additions
and
866 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,94 @@ | ||
name: Dataverse SPI | ||
|
||
on: | ||
push: | ||
branch: | ||
- "develop" | ||
paths: | ||
- "modules/dataverse-spi/**" | ||
pull_request: | ||
branch: | ||
- "develop" | ||
paths: | ||
- "modules/dataverse-spi/**" | ||
|
||
jobs: | ||
# Note: Pushing packages to Maven Central requires access to secrets, which pull requests from remote forks | ||
# don't have. Skip in these cases. | ||
check-secrets: | ||
name: Check for Secrets Availability | ||
runs-on: ubuntu-latest | ||
outputs: | ||
available: ${{ steps.secret-check.outputs.available }} | ||
steps: | ||
- id: secret-check | ||
# perform secret check & put boolean result as an output | ||
shell: bash | ||
run: | | ||
if [ "${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }}" != '' ]; then | ||
echo "available=true" >> $GITHUB_OUTPUT; | ||
else | ||
echo "available=false" >> $GITHUB_OUTPUT; | ||
fi | ||
snapshot: | ||
name: Release Snapshot | ||
needs: check-secrets | ||
runs-on: ubuntu-latest | ||
if: github.event_name == 'pull_request' && needs.check-secrets.outputs.available == 'true' | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-java@v3 | ||
with: | ||
java-version: '11' | ||
distribution: 'adopt' | ||
server-id: ossrh | ||
server-username: MAVEN_USERNAME | ||
server-password: MAVEN_PASSWORD | ||
- uses: actions/cache@v2 | ||
with: | ||
path: ~/.m2 | ||
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }} | ||
restore-keys: ${{ runner.os }}-m2 | ||
|
||
- name: Deploy Snapshot | ||
run: mvn -f modules/dataverse-spi -Dproject.version.suffix="-PR${{ github.event.number }}-SNAPSHOT" deploy | ||
env: | ||
MAVEN_USERNAME: ${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }} | ||
MAVEN_PASSWORD: ${{ secrets.DATAVERSEBOT_SONATYPE_TOKEN }} | ||
|
||
release: | ||
name: Release | ||
needs: check-secrets | ||
runs-on: ubuntu-latest | ||
if: github.event_name == 'push' && needs.check-secrets.outputs.available == 'true' | ||
steps: | ||
- uses: actions/checkout@v3 | ||
- uses: actions/setup-java@v3 | ||
with: | ||
java-version: '11' | ||
distribution: 'adopt' | ||
- uses: actions/cache@v2 | ||
with: | ||
path: ~/.m2 | ||
key: ${{ runner.os }}-m2-${{ hashFiles('**/pom.xml') }} | ||
restore-keys: ${{ runner.os }}-m2 | ||
|
||
# Running setup-java again overwrites the settings.xml - IT'S MANDATORY TO DO THIS SECOND SETUP!!! | ||
- name: Set up Maven Central Repository | ||
uses: actions/setup-java@v3 | ||
with: | ||
java-version: '11' | ||
distribution: 'adopt' | ||
server-id: ossrh | ||
server-username: MAVEN_USERNAME | ||
server-password: MAVEN_PASSWORD | ||
gpg-private-key: ${{ secrets.DATAVERSEBOT_GPG_KEY }} | ||
gpg-passphrase: MAVEN_GPG_PASSPHRASE | ||
|
||
- name: Sign + Publish Release | ||
run: mvn -f modules/dataverse-spi -P release deploy | ||
env: | ||
MAVEN_USERNAME: ${{ secrets.DATAVERSEBOT_SONATYPE_USERNAME }} | ||
MAVEN_PASSWORD: ${{ secrets.DATAVERSEBOT_SONATYPE_TOKEN }} | ||
MAVEN_GPG_PASSPHRASE: ${{ secrets.DATAVERSEBOT_GPG_PASSWORD }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
## Ability to Create New Exporters | ||
|
||
It is now possible for third parties to develop and share code to provide new metadata export formats for Dataverse. Export formats can be made available via the Dataverse UI and API or configured for use in Harvesting. Dataverse now provides developers with a separate dataverse-spi JAR file that contains the Java interfaces and classes required to create a new metadata Exporter. Once a new Exporter has been created and packaged as a JAR file, administrators can use it by specifying a local directory for third party Exporters, dropping then Exporter JAR there, and restarting Payara. This mechanism also allows new Exporters to replace any of Dataverse's existing metadata export formats. | ||
|
||
## Backward Incompatibilities | ||
|
||
Care should be taken when replacing Dataverse's internal metadata export formats as third party code, including other third party Exporters may depend on the contents of those export formats. When replacing an existing format, one must also remember to delete the cached metadata export files or run the reExport command for the metadata exports of existing datasets to be updated. | ||
|
||
## New JVM/MicroProfile Settings | ||
|
||
dataverse.spi.export.directory - specifies a directory, readable by the Dataverse server. Any Exporter JAR files placed in this directory will be read by Dataverse and used to add/replace the specified metadata format. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,88 @@ | ||
======================= | ||
Metadata Export Formats | ||
======================= | ||
|
||
.. contents:: |toctitle| | ||
:local: | ||
|
||
Introduction | ||
------------ | ||
|
||
Dataverse ships with a number of metadata export formats available for published datasets. A given metadata export | ||
format may be available for user download (via the UI and API) and/or be available for use in Harvesting between | ||
Dataverse instances. | ||
|
||
As of v5.14, Dataverse provides a mechanism for third-party developers to create new metadata Exporters than implement | ||
new metadata formats or that replace existing formats. All the necessary dependencies are packaged in an interface JAR file | ||
available from Maven Central. Developers can distribute their new Exporters as JAR files which can be dynamically loaded | ||
into Dataverse instances - see :ref:`external-exporters`. Developers are encouraged to make their Exporter code available | ||
via https://github.com/gdcc/dataverse-exporters (or minimally, to list their existence in the README there). | ||
|
||
Exporter Basics | ||
--------------- | ||
|
||
New Exports must implement the ``io.gdcc.spi.export.Exporter`` interface. The interface includes a few methods for the Exporter | ||
to provide Dataverse with the format it produces, a display name, format mimetype, and whether the format is for download | ||
and/or harvesting use, etc. It also includes a main ``exportDataset(ExportDataProvider dataProvider, OutputStream outputStream)`` | ||
method through which the Exporter receives metadata about the given dataset (via the ``ExportDataProvider``, described further | ||
below) and writes its output (as an OutputStream). | ||
|
||
Exporters that create an XML format must implement the ``io.gdcc.spi.export.XMLExporter`` interface (which extends the Exporter | ||
interface). XMLExporter adds a few methods through which the XMLExporter provides information to Dataverse about the XML | ||
namespace and version being used. | ||
|
||
Exporters also need to use the ``@AutoService(Exporter.class)`` which makes the class discoverable as an Exporter implementation. | ||
|
||
The ``ExportDataProvider`` interface provides several methods through which your Exporter can receive dataset and file metadata | ||
in various formats. Your exporter would parse the information in one or more of these inputs to retrieve the values needed to | ||
generate the Exporter's output format. | ||
|
||
The most important methods/input formats are: | ||
|
||
- ``getDatasetJson()`` - metadata in the internal Dataverse JSON format used in the native API and available via the built-in JSON metadata export. | ||
- ``getDatasetORE()`` - metadata in the OAI_ORE format available as a built-in metadata format and as used in Dataverse's BagIT-based Archiving capability. | ||
- ``getDatasetFileDetails`` - detailed file-level metadata for ingested tabular files. | ||
|
||
The first two of these provide ~complete metadata about the dataset along with the metadata common to all files. This includes all metadata | ||
entries from all metadata blocks, PIDs, tags, Licenses and custom terms, etc. Almost all built-in exporters today use the JSON input. | ||
The newer OAI_ORE export, which is JSON-LD-based, provides a flatter structure and references metadata terms by their external vocabulary ids | ||
(e.g. http://purl.org/dc/terms/title) which may make it a prefereable starting point in some cases. | ||
|
||
The last method above provides a new JSON-formatted serialization of the variable-level file metadata Dataverse generates during ingest of tabular files. | ||
This information has only been included in the built-in DDI export, as the content of a ``dataDscr`` element. (Hence inspecting the edu.harvard.iq.dataverse.export.DDIExporter and related classes would be a good way to explore how the JSON is structured.) | ||
|
||
The interface also provides | ||
|
||
- ``getDatasetSchemaDotOrg();`` and | ||
- ``getDataCiteXml();``. | ||
|
||
These provide subsets of metadata in the indicated formats. They may be useful starting points if your exporter will, for example, only add one or two additional fields to the given format. | ||
|
||
If an Exporter cannot create a requested metadata format for some reason, it should throw an ``io.gdcc.spi.export.ExportException``. | ||
|
||
Building an Exporter | ||
-------------------- | ||
|
||
The example at https://github.com/gdcc/dataverse-exporters provides a Maven pom.xml file suitable for building an Exporter JAR file and that repository provides additional development guidance. | ||
|
||
There are four dependencies needed to build an Exporter: | ||
|
||
- ``io.gdcc dataverse-spi`` library containing the interfaces discussed above and the ExportException class | ||
- ``com.google.auto.service auto-service``, which provides the @AutoService annotation | ||
- ``jakarta.json jakarata.json-api`` for JSON classes | ||
- ``jakarta.ws.rs jakarta.ws.rs-api``, which provides a MediaType enumeration for specifying mime types. | ||
|
||
Specifying a Prerequisite Export | ||
-------------------------------- | ||
|
||
An advanced feature of the Exporter mechanism allows a new Exporter to specify that it requires, as input, | ||
the output of another Exporter. An example of this is the builting HTMLExporter which requires the output | ||
of the DDI XML Exporter to produce an HTML document with the same DDI content. | ||
|
||
This is configured by providing the metadata format name via the ``Exporter.getPrerequisiteFormatName()`` method. | ||
When this method returns a non-empty format name, Dataverse will provide the requested format to the Exporter via | ||
the ``ExportDataProvider.getPrerequisiteInputStream()`` method. | ||
|
||
Developers and administrators deploying Exporters using this mechanism should be aware that, since metadata formats | ||
can be changed by other Exporters, the InputStream received may not hold the expected metadata. Developers should clearly | ||
document their compatability with the built-in or third-party Exporters they support as prerequisites. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.