Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add S3 tests, LocalStack, MinIO #10044

Merged
merged 17 commits into from
Dec 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions conf/localstack/buckets.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
#!/usr/bin/env bash
# https://stackoverflow.com/questions/53619901/auto-create-s3-buckets-on-localstack
awslocal s3 mb s3://mybucket
3 changes: 3 additions & 0 deletions doc/release-notes/6783-s3-tests.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Developers can now test S3 locally by using the Dockerized development environment, which now includes both LocalStack and MinIO. API (end to end) tests are in S3AccessIT.

In addition, a new integration test class (not an API test, the new Testcontainers-based test launched with `mvn verify`) has been added at S3AccessIOLocalstackIT. It uses Testcontainers to spin up Localstack for S3 testing and does not require Dataverse to be running.
4 changes: 4 additions & 0 deletions doc/sphinx-guides/source/admin/dataverses-datasets.rst
Original file line number Diff line number Diff line change
Expand Up @@ -53,11 +53,15 @@ Configure a Dataverse Collection to Store All New Files in a Specific File Store
To direct new files (uploaded when datasets are created or edited) for all datasets in a given Dataverse collection, the store can be specified via the API as shown below, or by editing the 'General Information' for a Dataverse collection on the Dataverse collection page. Only accessible to superusers. ::

curl -H "X-Dataverse-key: $API_TOKEN" -X PUT -d $storageDriverLabel http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver

(Note that for ``dataverse.files.store1.label=MyLabel``, you should pass ``MyLabel``.)

The current driver can be seen using::

curl -H "X-Dataverse-key: $API_TOKEN" http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver

(Note that for ``dataverse.files.store1.label=MyLabel``, ``store1`` will be returned.)

and can be reset to the default store with::

curl -H "X-Dataverse-key: $API_TOKEN" -X DELETE http://$SERVER/api/admin/dataverse/$dataverse-alias/storageDriver
Expand Down
3 changes: 2 additions & 1 deletion doc/sphinx-guides/source/developers/s3-direct-upload-api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,9 @@ In the single part case, only one call to the supplied URL is required:

.. code-block:: bash

curl -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"
curl -i -H 'x-amz-tagging:dv-state=temp' -X PUT -T <filename> "<supplied url>"

Note that without the ``-i`` flag, you should not expect any output from the command above. With the ``-i`` flag, you should expect to see a "200 OK" response.

In the multipart case, the client must send each part and collect the 'eTag' responses from the server. The calls for this are the same as the one for the single part case except that each call should send a <partSize> slice of the total file, with the last part containing the remaining bytes.
The responses from the S3 server for these calls will include the 'eTag' for the uploaded part.
Expand Down
41 changes: 21 additions & 20 deletions doc/sphinx-guides/source/developers/testing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -190,42 +190,38 @@ Finally, run the script:

$ ./ec2-create-instance.sh -g jenkins.yml -l log_dir

Running the full API test suite using Docker
Running the Full API Test Suite Using Docker
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To run the full suite of integration tests on your laptop, running Dataverse and its dependencies in Docker, as explained in the :doc:`/container/dev-usage` section of the Container Guide.
To run the full suite of integration tests on your laptop, we recommend running Dataverse and its dependencies in Docker, as explained in the :doc:`/container/dev-usage` section of the Container Guide. This environment provides additional services (such as S3) that are used in testing.

Alternatively, you can run tests against the app server running on your laptop by following the "getting set up" steps below.
Running the APIs Without Docker (Classic Dev Env)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Getting Set Up to Run REST Assured Tests
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
While it is possible to run a good number of API tests without using Docker in our :doc:`classic-dev-env`, we are transitioning toward including additional services (such as S3) in our Dockerized development environment (:doc:`/container/dev-usage`), so you will probably find it more convenient to it instead.

Unit tests are run automatically on every build, but dev environments and servers require special setup to run REST Assured tests. In short, the Dataverse Software needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed. This differs greatly from the out-of-the-box behavior of the Dataverse Software, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment.

The :doc:`dev-environment` section currently refers developers here for advice on getting set up to run REST Assured tests, but we'd like to add some sort of "dev" flag to the installer to put the Dataverse Software in "insecure" mode, with lots of scary warnings that this dev mode should not be used in production.

The instructions below assume a relatively static dev environment on a Mac. There is a newer "all in one" Docker-based approach documented in the :doc:`/developers/containers` section under "Docker" that you may like to play with as well.
Unit tests are run automatically on every build, but dev environments and servers require special setup to run API (REST Assured) tests. In short, the Dataverse software needs to be placed into an insecure mode that allows arbitrary users and datasets to be created and destroyed (this is done automatically in the Dockerized environment, as well as the steps described below). This differs greatly from the out-of-the-box behavior of the Dataverse software, which we strive to keep secure for sysadmins installing the software for their institutions in a production environment.

The Burrito Key
^^^^^^^^^^^^^^^

For reasons that have been lost to the mists of time, the Dataverse Software really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database):
For reasons that have been lost to the mists of time, the Dataverse software really wants you to to have a burrito. Specifically, if you're trying to run REST Assured tests and see the error "Dataverse config issue: No API key defined for built in user management", you must run the following curl command (or make an equivalent change to your database):

``curl -X PUT -d 'burrito' http://localhost:8080/api/admin/settings/BuiltinUsers.KEY``

Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as Dataverse collections, datasets, and files.
Without this "burrito" key in place, REST Assured will not be able to create users. We create users to create objects we want to test, such as collections, datasets, and files.

Root Dataverse Collection Permissions
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Root Collection Permissions
^^^^^^^^^^^^^^^^^^^^^^^^^^^

In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root Dataverse collection. Navigate to Permissions, then the Edit Access button. Under "Who can add to this Dataverse collection?" choose "Anyone with a Dataverse installation account can add sub Dataverse collections and datasets" if it isn't set to this already.
In your browser, log in as dataverseAdmin (password: admin) and click the "Edit" button for your root collection. Navigate to Permissions, then the Edit Access button. Under "Who can add to this collection?" choose "Anyone with a Dataverse installation account can add sub collections and datasets" if it isn't set to this already.

Alternatively, this same step can be done with this script: ``scripts/search/tests/grant-authusers-add-on-root``

Publish Root Dataverse Collection
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Publish Root Collection
^^^^^^^^^^^^^^^^^^^^^^^

The root Dataverse collection must be published for some of the REST Assured tests to run.
The root collection must be published for some of the REST Assured tests to run.

dataverse.siteUrl
^^^^^^^^^^^^^^^^^
Expand Down Expand Up @@ -274,15 +270,20 @@ Remember, it’s only a test (and it's not graded)! Some guidelines to bear in m
- Assert the conditions of success / return values for each operation
* A useful resource would be `HTTP status codes <https://www.restapitutorial.com/httpstatuscodes.html>`_
- Let the code do the labor; automate everything that happens when you run your test file.
- If you need to test an optional service (S3, etc.), add it to our docker compose file. See :doc:`/container/dev-usage`.
- Just as with any development, if you’re stuck: ask for help!

To execute existing integration tests on your local Dataverse installation, a helpful command line tool to use is `Maven <https://maven.apache.org/ref/3.1.0/maven-embedder/cli.html>`_. You should have Maven installed as per the `Development Environment <https://guides.dataverse.org/en/latest/developers/dev-environment.html>`_ guide, but if not it’s easily done via Homebrew: ``brew install maven``.

Once installed, you may run commands with ``mvn [options] [<goal(s)>] [<phase(s)>]``.

+ If you want to run just one particular API test, it’s as easy as you think:
+ If you want to run just one particular API test class:

``mvn test -Dtest=UsersIT``

+ If you want to run just one particular API test method,

``mvn test -Dtest=FileRecordJobIT``
``mvn test -Dtest=UsersIT#testMergeAccounts``

+ To run more than one test at a time, separate by commas:

Expand Down
79 changes: 69 additions & 10 deletions docker-compose-dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,16 +9,40 @@ services:
restart: on-failure
user: payara
environment:
- DATAVERSE_DB_HOST=postgres
- DATAVERSE_DB_PASSWORD=secret
- DATAVERSE_DB_USER=${DATAVERSE_DB_USER}
- ENABLE_JDWP=1
- DATAVERSE_FEATURE_API_BEARER_AUTH=1
- DATAVERSE_AUTH_OIDC_ENABLED=1
- DATAVERSE_AUTH_OIDC_CLIENT_ID=test
- DATAVERSE_AUTH_OIDC_CLIENT_SECRET=94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8
- DATAVERSE_AUTH_OIDC_AUTH_SERVER_URL=http://keycloak.mydomain.com:8090/realms/test
- DATAVERSE_JSF_REFRESH_PERIOD=1
DATAVERSE_DB_HOST: postgres
DATAVERSE_DB_PASSWORD: secret
DATAVERSE_DB_USER: ${DATAVERSE_DB_USER}
ENABLE_JDWP: "1"
DATAVERSE_FEATURE_API_BEARER_AUTH: "1"
DATAVERSE_AUTH_OIDC_ENABLED: "1"
DATAVERSE_AUTH_OIDC_CLIENT_ID: test
DATAVERSE_AUTH_OIDC_CLIENT_SECRET: 94XHrfNRwXsjqTqApRrwWmhDLDHpIYV8
DATAVERSE_AUTH_OIDC_AUTH_SERVER_URL: http://keycloak.mydomain.com:8090/realms/test
DATAVERSE_JSF_REFRESH_PERIOD: "1"
JVM_ARGS: -Ddataverse.files.storage-driver-id=file1
-Ddataverse.files.file1.type=file
-Ddataverse.files.file1.label=Filesystem
-Ddataverse.files.file1.directory=${STORAGE_DIR}/store
-Ddataverse.files.localstack1.type=s3
-Ddataverse.files.localstack1.label=LocalStack
-Ddataverse.files.localstack1.custom-endpoint-url=http://localstack:4566
-Ddataverse.files.localstack1.custom-endpoint-region=us-east-2
-Ddataverse.files.localstack1.bucket-name=mybucket
-Ddataverse.files.localstack1.path-style-access=true
-Ddataverse.files.localstack1.upload-redirect=true
-Ddataverse.files.localstack1.download-redirect=true
-Ddataverse.files.localstack1.access-key=default
-Ddataverse.files.localstack1.secret-key=default
-Ddataverse.files.minio1.type=s3
-Ddataverse.files.minio1.label=MinIO
-Ddataverse.files.minio1.custom-endpoint-url=http://minio:9000
-Ddataverse.files.minio1.custom-endpoint-region=us-east-1
-Ddataverse.files.minio1.bucket-name=mybucket
-Ddataverse.files.minio1.path-style-access=true
-Ddataverse.files.minio1.upload-redirect=false
-Ddataverse.files.minio1.download-redirect=false
-Ddataverse.files.minio1.access-key=4cc355_k3y
-Ddataverse.files.minio1.secret-key=s3cr3t_4cc355_k3y
ports:
- "8080:8080" # HTTP (Dataverse Application)
- "4848:4848" # HTTP (Payara Admin Console)
Expand Down Expand Up @@ -156,6 +180,41 @@ services:
networks:
- dataverse

dev_localstack:
container_name: "dev_localstack"
hostname: "localstack"
image: localstack/localstack:2.3.2
restart: on-failure
ports:
- "127.0.0.1:4566:4566"
environment:
- DEBUG=${DEBUG-}
- DOCKER_HOST=unix:///var/run/docker.sock
- HOSTNAME_EXTERNAL=localstack
networks:
- dataverse
volumes:
- ./conf/localstack:/etc/localstack/init/ready.d
tmpfs:
- /localstack:mode=770,size=128M,uid=1000,gid=1000

dev_minio:
container_name: "dev_minio"
hostname: "minio"
image: minio/minio
restart: on-failure
ports:
- "9000:9000"
- "9001:9001"
networks:
- dataverse
volumes:
- minio_storage:/data
environment:
MINIO_ROOT_USER: 4cc355_k3y
MINIO_ROOT_PASSWORD: s3cr3t_4cc355_k3y
command: server /data

networks:
dataverse:
driver: bridge
5 changes: 5 additions & 0 deletions pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -612,6 +612,11 @@
<version>3.0.0</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.testcontainers</groupId>
<artifactId>localstack</artifactId>
<scope>test</scope>
</dependency>
<!--
Brute force solution until we are on Jakarta EE 10.
Otherwise, we get very cryptic errors about missing bundle files on test runs.
Expand Down
97 changes: 97 additions & 0 deletions src/test/java/edu/harvard/iq/dataverse/api/S3AccessDirectIT.java
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
package edu.harvard.iq.dataverse.api;

import io.restassured.RestAssured;
import static io.restassured.RestAssured.given;
import io.restassured.path.json.JsonPath;
import io.restassured.response.Response;
import io.restassured.specification.RequestSpecification;
import java.io.ByteArrayInputStream;
import java.io.InputStream;
import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.nio.charset.StandardCharsets;
import org.apache.commons.lang3.math.NumberUtils;
import org.junit.jupiter.api.Test;

public class S3AccessDirectIT {

@Test
public void testS3DirectUpload() {
// TODO: remove all these constants
RestAssured.baseURI = "https://demo.dataverse.org";
String apiToken = "";
String datasetPid = "doi:10.70122/FK2/UBWSJU";
String datasetId = "2106131";
long size = 1000000000l;

Response getUploadUrls = getUploadUrls(datasetPid, size, apiToken);
getUploadUrls.prettyPrint();
getUploadUrls.then().assertThat().statusCode(200);

String url = JsonPath.from(getUploadUrls.asString()).getString("data.url");
String partSize = JsonPath.from(getUploadUrls.asString()).getString("data.partSize");
String storageIdentifier = JsonPath.from(getUploadUrls.asString()).getString("data.storageIdentifier");
System.out.println("url: " + url);
System.out.println("partSize: " + partSize);
System.out.println("storageIdentifier: " + storageIdentifier);

System.out.println("uploading file via direct upload");
String decodedUrl = null;
try {
decodedUrl = URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException ex) {
}

InputStream inputStream = new ByteArrayInputStream("bumble".getBytes(StandardCharsets.UTF_8));
Response uploadFileDirect = uploadFileDirect(decodedUrl, inputStream);
uploadFileDirect.prettyPrint();
uploadFileDirect.then().assertThat().statusCode(200);

// TODO: Use MD5 or whatever Dataverse is configured for and
// actually calculate it.
String jsonData = """
{
"description": "My description.",
"directoryLabel": "data/subdir1",
"categories": [
"Data"
],
"restrict": "false",
"storageIdentifier": "%s",
"fileName": "file1.txt",
"mimeType": "text/plain",
"checksum": {
"@type": "SHA-1",
"@value": "123456"
}
}
""".formatted(storageIdentifier);
Response addRemoteFile = UtilIT.addRemoteFile(datasetId, jsonData, apiToken);
addRemoteFile.prettyPrint();
addRemoteFile.then().assertThat()
.statusCode(200);
}

static Response getUploadUrls(String idOrPersistentIdOfDataset, long sizeInBytes, String apiToken) {
String idInPath = idOrPersistentIdOfDataset; // Assume it's a number.
String optionalQueryParam = ""; // If idOrPersistentId is a number we'll just put it in the path.
if (!NumberUtils.isCreatable(idOrPersistentIdOfDataset)) {
idInPath = ":persistentId";
optionalQueryParam = "&persistentId=" + idOrPersistentIdOfDataset;
}
RequestSpecification requestSpecification = given();
if (apiToken != null) {
requestSpecification = given()
.header(UtilIT.API_TOKEN_HTTP_HEADER, apiToken);
}
return requestSpecification.get("/api/datasets/" + idInPath + "/uploadurls?size=" + sizeInBytes + optionalQueryParam);
}

static Response uploadFileDirect(String url, InputStream inputStream) {
return given()
.header("x-amz-tagging", "dv-state=temp")
.body(inputStream)
.put(url);
}

}
Loading
Loading