Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8290 harvest client api #9174

Merged
merged 19 commits into from
Dec 1, 2022
Merged
Show file tree
Hide file tree
Changes from 16 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
141 changes: 141 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3200,6 +3200,147 @@ The fully expanded example above (without the environment variables) looks like

Only users with superuser permissions may delete harvesting sets.

Managing Harvesting Clients
---------------------------

The following API can be used to create and manage "Harvesting Clients". A Harvesting Client is a configuration entry that allows your Dataverse installation to harvest and index metadata from a specific remote location, either regularly, on a configured schedule, or on a one-off basis. For more information, see the :doc:`/admin/harvestclients` section of the Admin Guide.

List All Configured Harvesting Clients
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Shows all the Harvesting Clients configured::

GET http://$SERVER/api/harvest/clients/

Show a Specific Harvesting Client
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Shows a Harvesting Client with a defined nickname::

GET http://$SERVER/api/harvest/clients/$nickname

.. code-block:: bash

curl "http://localhost:8080/api/harvest/clients/myclient"

{
"status":"OK",
{
"data": {
"lastDatasetsFailed": "22",
"lastDatasetsDeleted": "0",
"metadataFormat": "oai_dc",
"archiveDescription": "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data.",
"archiveUrl": "https://dataverse.foo.edu",
"harvestUrl": "https://dataverse.foo.edu/oai",
"style": "dataverse",
"type": "oai",
"dataverseAlias": "fooData",
"nickName": "myClient",
"set": "fooSet",
"schedule": "none",
"status": "inActive",
"lastHarvest": "Thu Oct 13 14:48:57 EDT 2022",
"lastResult": "SUCCESS",
"lastSuccessful": "Thu Oct 13 14:48:57 EDT 2022",
"lastNonEmpty": "Thu Oct 13 14:48:57 EDT 2022",
"lastDatasetsHarvested": "137"
}
}


Create a Harvesting Client
~~~~~~~~~~~~~~~~~~~~~~~~~~

To create a new harvesting client::

POST http://$SERVER/api/harvest/clients/$nickname

``nickName`` is the name identifying the new client. It should be alpha-numeric and may also contain -, _, or %, but no spaces. Must also be unique in the installation.

You must supply a JSON file that describes the configuration, similarly to the output of the GET API above. The following fields are mandatory:

- dataverseAlias: The alias of an existing collection where harvested datasets will be deposited
- harvestUrl: The URL of the remote OAI archive
- archiveUrl: The URL of the remote archive that will be used in the redirect links pointing back to the archival locations of the harvested records. It may or may not be on the same server as the harvestUrl above. If this OAI archive is another Dataverse installation, it will be the same URL as harvestUrl minus the "/oai". For example: https://demo.dataverse.org/ vs. https://demo.dataverse.org/oai
- metadataFormat: A supported metadata format. As of writing this the supported formats are "oai_dc", "oai_ddi" and "dataverse_json".

The following optional fields are supported:

- archiveDescription: What the name suggests. If not supplied, will default to "This Dataset is harvested from our partners. Clicking the link will take you directly to the archival source of the data."
- set: The OAI set on the remote server. If not supplied, will default to none, i.e., "harvest everything".
- style: Defaults to "default" - a generic OAI archive. (Make sure to use "dataverse" when configuring harvesting from another Dataverse installation).

Generally, the API will accept the output of the GET version of the API for an existing client as valid input, but some fields will be ignored. For example, as of writing this there is no way to configure a harvesting schedule via this API.

An example JSON file would look like this::

{
"nickName": "zenodo",
"dataverseAlias": "zenodoHarvested",
"harvestUrl": "https://zenodo.org/oai2d",
"archiveUrl": "https://zenodo.org",
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
"metadataFormat": "oai_dc",
"set": "user-lmops"
}

.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of export below.

.. code-block:: bash

export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=http://localhost:8080

curl -H X-Dataverse-key:$API_TOKEN -X POST -H "Content-Type: application/json" "$SERVER_URL/api/harvest/clients/zenodo" --upload-file client.json

The fully expanded example above (without the environment variables) looks like this:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST -H "Content-Type: application/json" "http://localhost:8080/api/harvest/clients/zenodo" --upload-file "client.json"

{
"status": "OK",
"data": {
"metadataFormat": "oai_dc",
"archiveDescription": "Moissonné depuis la collection LMOPS de l'entrepôt Zenodo. En cliquant sur ce jeu de données, vous serez redirigé vers Zenodo.",
"archiveUrl": "https://zenodo.org",
"harvestUrl": "https://zenodo.org/oai2d",
"style": "default",
"type": "oai",
"dataverseAlias": "zenodoHarvested",
"nickName": "zenodo",
"set": "user-lmops",
"schedule": "none",
"status": "inActive",
"lastHarvest": "N/A",
"lastSuccessful": "N/A",
"lastNonEmpty": "N/A",
"lastDatasetsHarvested": "N/A",
"lastDatasetsDeleted": "N/A"
}
}

Only users with superuser permissions may create or configure harvesting clients.

Modify a Harvesting Client
~~~~~~~~~~~~~~~~~~~~~~~~~~

Similar to the API above, using the same JSON format, but run on an existing client and using the PUT method instead of POST.

Delete a Harvesting Client
~~~~~~~~~~~~~~~~~~~~~~~~~~

Self-explanatory:

.. code-block:: bash

curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X DELETE "http://localhost:8080/api/harvest/clients/$nickName"

Only users with superuser permissions may delete harvesting clients.


PIDs
----

Expand Down
Loading