Skip to content

Commit

Permalink
Merge pull request #7901 from scholarsportal/7900-api-toadd-multipleF…
Browse files Browse the repository at this point in the history
…iles

api /addFiles
  • Loading branch information
kcondon authored Jul 14, 2021
2 parents 3c85c51 + ab82a97 commit 8c4d651
Show file tree
Hide file tree
Showing 6 changed files with 373 additions and 28 deletions.
4 changes: 4 additions & 0 deletions doc/release-notes/7900-add-multipleFilesMetadata-dataset.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
### Direct Upload API Now Available for Adding Multiple Files Metadata to the Dataset

Users can now add metadata of multiple files to the dataset once the files exists in the s3 bucket using the direct upload API.
For more information, see the [Direct DataFile Upload/Replace API section](https://guides.dataverse.org/en/5.6/developers/s3-direct-upload-api.html) of the Dataverse Software Guides.
43 changes: 43 additions & 0 deletions doc/sphinx-guides/source/api/native-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2116,6 +2116,49 @@ The fully expanded example above (without environment variables) looks like this
Note: The ``id`` returned in the json response is the id of the file metadata version.
Adding File Metadata
~~~~~~~~~~~~~~~~~~~~
This API call requires a ``jsonString`` expressing the metadata of multiple files. It adds file metadata to the database table where the file has already been copied to the storage.
The jsonData object includes values for:
* "description" - A description of the file
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
* "storageIdentifier" - String
* "fileName" - String
* "mimeType" - String
* "fixity/checksum" either:
* "md5Hash" - String with MD5 hash value, or
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings
.. note:: See :ref:`curl-examples-and-environment-variables` if you are unfamiliar with the use of ``export`` below.
A curl example using an ``PERSISTENT_ID``
* ``SERVER_URL`` - e.g. https://demo.dataverse.org
* ``API_TOKEN`` - API endpoints require an API token that can be passed as the X-Dataverse-key HTTP header. For more details, see the :doc:`auth` section.
* ``PERSISTENT_IDENTIFIER`` - Example: ``doi:10.5072/FK2/7U7YBV``
.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \
{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]"
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
The fully expanded example above (without environment variables) looks like this:
.. code-block:: bash
curl -H "X-Dataverse-key:xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx" -X POST https://demo.dataverse.org/api/datasets/:persistentId/addFiles?persistentId=doi:10.5072/FK2/7U7YBV -F jsonData='[{"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42", "fileName":"file1.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123456"}}, {"description":"My description.","directoryLabel":"data/subdir1","categories":["Data"], "restrict":"false", "storageIdentifier":"s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53", "fileName":"file2.txt", "mimeType":"text/plain", "checksum": {"@type": "SHA-1", "@value": "123789"}}]'
Updating File Metadata
~~~~~~~~~~~~~~~~~~~~~~
Expand Down
34 changes: 33 additions & 1 deletion doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Direct upload involves a series of three activities, each involving interacting

* Requesting initiation of a transfer from the server
* Use of the pre-signed URL(s) returned in that call to perform an upload/multipart-upload of the file to S3
* A call to the server to register the file as part of the dataset/replace a file in the dataset or to cancel the transfer
* A call to the server to register the file/files as part of the dataset/replace a file in the dataset or to cancel the transfer

This API is only enabled when a Dataset is configured with a data store supporting direct S3 upload.
Administrators should be aware that partial transfers, where a client starts uploading the file/parts of the file and does not contact the server to complete/cancel the transfer, will result in data stored in S3 that is not referenced in the Dataverse installation (e.g. should be considered temporary and deleted.)
Expand Down Expand Up @@ -116,6 +116,38 @@ The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.Data
Note that this API call can be used independently of the others, e.g. supporting use cases in which the file already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.

To add multiple Uploaded Files to the Dataset
-------------------------------------------------

Once the files exists in the s3 bucket, a final API call is needed to add all the files to the Dataset. In this API call, additional metadata is added using the "jsonData" parameter.
jsonData normally includes information such as a file description, tags, provenance, whether the file is restricted, etc. For direct uploads, the jsonData object must also include values for:

* "description" - A description of the file
* "directoryLabel" - The "File Path" of the file, indicating which folder the file should be uploaded to within the dataset
* "storageIdentifier" - String
* "fileName" - String
* "mimeType" - String
* "fixity/checksum" either:

* "md5Hash" - String with MD5 hash value, or
* "checksum" - Json Object with "@type" field specifying the algorithm used and "@value" field with the value from that algorithm, both Strings

The allowed checksum algorithms are defined by the edu.harvard.iq.dataverse.DataFile.CheckSumType class and currently include MD5, SHA-1, SHA-256, and SHA-512

.. code-block:: bash
export API_TOKEN=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
export SERVER_URL=https://demo.dataverse.org
export PERSISTENT_IDENTIFIER=doi:10.5072/FK2/7U7YBV
export JSON_DATA="[{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357c42', 'fileName':'file1.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123456'}}, \
{'description':'My description.','directoryLabel':'data/subdir1','categories':['Data'], 'restrict':'false', 'storageIdentifier':'s3://demo-dataverse-bucket:176e28068b0-1c3f80357d53', 'fileName':'file2.txt', 'mimeType':'text/plain', 'checksum': {'@type': 'SHA-1', '@value': '123789'}}]"
curl -X POST -H "X-Dataverse-key: $API_TOKEN" "$SERVER_URL/api/datasets/:persistentId/addFiles?persistentId=$PERSISTENT_IDENTIFIER" -F "jsonData=$JSON_DATA"
Note that this API call can be used independently of the others, e.g. supporting use cases in which the files already exists in S3/has been uploaded via some out-of-band method.
With current S3 stores the object identifier must be in the correct bucket for the store, include the PID authority/identifier of the parent dataset, and be guaranteed unique, and the supplied storage identifer must be prefaced with the store identifier used in the Dataverse installation, as with the internally generated examples above.


Replacing an existing file in the Dataset
-----------------------------------------

Expand Down
69 changes: 69 additions & 0 deletions src/main/java/edu/harvard/iq/dataverse/api/Datasets.java
Original file line number Diff line number Diff line change
Expand Up @@ -2532,4 +2532,73 @@ public Response getTimestamps(@PathParam("identifier") String id) {
return wr.getResponse();
}
}


/**
* Add multiple Files to an existing Dataset
*
* @param idSupplied
* @param jsonData
* @return
*/
@POST
@Path("{id}/addFiles")
@Consumes(MediaType.MULTIPART_FORM_DATA)
public Response addFilesToDataset(@PathParam("id") String idSupplied,
@FormDataParam("jsonData") String jsonData) {

if (!systemConfig.isHTTPUpload()) {
return error(Response.Status.SERVICE_UNAVAILABLE, BundleUtil.getStringFromBundle("file.api.httpDisabled"));
}

// -------------------------------------
// (1) Get the user from the API key
// -------------------------------------
User authUser;
try {
authUser = findUserOrDie();
} catch (WrappedResponse ex) {
return error(Response.Status.FORBIDDEN, BundleUtil.getStringFromBundle("file.addreplace.error.auth")
);
}

// -------------------------------------
// (2) Get the Dataset Id
// -------------------------------------
Dataset dataset;

try {
dataset = findDatasetOrDie(idSupplied);
} catch (WrappedResponse wr) {
return wr.getResponse();
}


//------------------------------------
// (2a) Make sure dataset does not have package file
// --------------------------------------

for (DatasetVersion dv : dataset.getVersions()) {
if (dv.isHasPackageFile()) {
return error(Response.Status.FORBIDDEN,
BundleUtil.getStringFromBundle("file.api.alreadyHasPackageFile")
);
}
}

DataverseRequest dvRequest = createDataverseRequest(authUser);

AddReplaceFileHelper addFileHelper = new AddReplaceFileHelper(
dvRequest,
this.ingestService,
this.datasetService,
this.fileService,
this.permissionSvc,
this.commandEngine,
this.systemConfig
);

return addFileHelper.addFiles(jsonData, dataset, authUser);

}
}
Loading

0 comments on commit 8c4d651

Please sign in to comment.