Skip to content

Commit

Permalink
Merge branch 'develop' into 7662-solrconfig
Browse files Browse the repository at this point in the history
  • Loading branch information
poikilotherm committed Feb 3, 2022
2 parents c385e09 + 40c6f30 commit 368358f
Show file tree
Hide file tree
Showing 40 changed files with 596 additions and 321 deletions.
7 changes: 7 additions & 0 deletions doc/release-notes/5733-s3-creds-chain.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Providing S3 Storage Credentials via MicroProfile Config

With this release, you may use two new options to pass an access key identifier and a secret access key for S3-based
storage definitions without creating the files used by the AWS CLI tools (`~/.aws/config` & `~/.aws/credentials`).

This has been added to ease setups using containers (Docker, Podman, Kubernetes, OpenShift) or testing and developing
installations. Find added [documentation and a word of warning in the installation guide](https://guides.dataverse.org/en/latest/installation/config.html#s3-mpconfig).
36 changes: 36 additions & 0 deletions doc/release-notes/8096-8205-ingest-messaging.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
Ingest and file upload messaging improvements

On the pages for adding files on create or edit

- Explain that all types are supported, link to tabular ingest guides
- Show max files per upload #7227
- Show global tabular ingest limit
- Show per format tabular ingest limits
- Move messaging from blue info block to regular text.

Failed ingest messaging

- Calm blue message that files are available in original format only
- Click for a popup about ingest and specifics of why ingest failed

Email notification for ingest success

- Link to tabular ingest guides
- Don't show the files that were just ingested

Email notification for ingest failure

- Link to tabular ingest guides
- Say "incomplete" instead of "error"

Email notification for mix of ingest success and failure

- Looks just like the email for failure, no mention of files that had success

In-app notification for ingest success

- Reworded and links to guides

In-app notification for ingest failure

- Reworded and links to guides
37 changes: 37 additions & 0 deletions doc/release-notes/8309-postgres-required.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
## Notes for Dataverse Installation Administrators

### PostgreSQL Version 10+ Required

If you are still using PostgreSQL 9.X, now is the time to upgrade. PostgreSQL is now EOL (no longer supported, as of January 2022), and the Flyway library in the next release of the Dataverse Software will no longer work with versions prior to 10.

The Dataverse Software has been tested with PostgreSQL versions up to 13. The current stable version 13.5 is recommended. If that's not an option for reasons specific to your installation (for example, if PostgreSQL 13.5 is not available for the OS distribution you are using), any 10+ version should work.

See the upgrade section for more information.


### PostgreSQL Upgrade (for the "Upgrade Instructions" section)

The tested and recommended way of upgrading an existing database is as follows:

- Export your current database with ``pg_dumpall``;
- Install the new version of PostgreSQL; (make sure it's running on the same port, so that no changes are needed in the Payara configuration)
- Re-import the database with ``psql``, as the user ``postgres``.

It is strongly recommended to use the versions of the ``pg_dumpall`` and ``psql`` from the old and new versions of PostgreSQL, respectively. For example, the commands below were used to migrate a database running under PostgreSQL 9.6 to 13.5. Adjust the versions and the path names to match your environment.

Back up/export:

``/usr/pgsql-9.6/bin/pg_dumpall -U postgres > /tmp/backup.sql``

Restore/import:

``/usr/pgsql-13/bin/psql -U postgres -f /tmp/backup.sql``

When upgrading the production database here at Harvard IQSS we were able to go from version 9.6 all the way to 13.3 without any issues.

You may want to try these backup and restore steps on a test server, to get an accurate estimate of how much downtime to expect with the final production upgrade. That of course will depend on the size of your database.

Consult the PostgreSQL upgrade documentation for more information, for example <https://www.postgresql.org/docs/13/upgrading.html#UPGRADING-VIA-PGDUMPALL>.



14 changes: 7 additions & 7 deletions doc/sphinx-guides/source/developers/dev-environment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -94,23 +94,23 @@ To install Payara, run the following commands:
Install PostgreSQL
~~~~~~~~~~~~~~~~~~

For the past few release cycles much of the development has been done under PostgreSQL 9.6. While that version is known to be very stable, it is nearing its end-of-life (in Nov. 2021). The Dataverse Software has now been tested with versions up to 13 (13.2 is the latest released version as of writing this).
The Dataverse Software has been tested with PostgreSQL versions up to 13. PostgreSQL version 10+ is required.

On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EDB" option. Note that version 9.6 is used in the command line examples below, but the process will be identical for any version up to 13. When prompted to set a password for the "database superuser (postgres)" just enter "password".
On Mac, go to https://www.postgresql.org/download/macosx/ and choose "Interactive installer by EDB" option. Note that version 13.5 is used in the command line examples below, but the process should be similar for other versions. When prompted to set a password for the "database superuser (postgres)" just enter "password".

After installation is complete, make a backup of the ``pg_hba.conf`` file like this:

``sudo cp /Library/PostgreSQL/9.6/data/pg_hba.conf /Library/PostgreSQL/9.6/data/pg_hba.conf.orig``
``sudo cp /Library/PostgreSQL/13/data/pg_hba.conf /Library/PostgreSQL/13/data/pg_hba.conf.orig``

Then edit ``pg_hba.conf`` with an editor such as vi:

``sudo vi /Library/PostgreSQL/9.6/data/pg_hba.conf``
``sudo vi /Library/PostgreSQL/13/data/pg_hba.conf``

In the "METHOD" column, change all instances of "md5" to "trust". This will make it so PostgreSQL doesn't require a password.
In the "METHOD" column, change all instances of "scram-sha-256" (or whatever is in that column) to "trust". This will make it so PostgreSQL doesn't require a password.

In the Finder, click "Applications" then "PostgreSQL 9.6" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled".
In the Finder, click "Applications" then "PostgreSQL 13" and launch the "Reload Configuration" app. Click "OK" after you see "server signaled".

Next, to confirm the edit worked, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 9.6". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password.
Next, to confirm the edit worked, launch the "pgAdmin" application from the same folder. Under "Browser", expand "Servers" and double click "PostgreSQL 13". When you are prompted for a password, leave it blank and click "OK". If you have successfully edited "pg_hba.conf", you can get in without a password.

On Linux, you should just install PostgreSQL using your favorite package manager, such as ``yum``. (Consult the PostgreSQL section of :doc:`/installation/prerequisites` in the main Installation guide for more info and command line examples). Find ``pg_hba.conf`` and set the authentication method to "trust" and restart PostgreSQL.

Expand Down
96 changes: 72 additions & 24 deletions doc/sphinx-guides/source/installation/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,9 @@ of two methods described below:
1. Manually through creation of the credentials and config files or
2. Automatically via the AWS console commands.

Some usage scenarios might be eased without generating these files. You may also provide :ref:`static credentials via
MicroProfile Config <s3-mpconfig>`, see below.

Preparation When Using Amazon's S3 Service
##########################################

Expand Down Expand Up @@ -526,28 +529,69 @@ been tested already and what other options have been set for a successful integr

Lastly, go ahead and restart your Payara server. With Dataverse deployed and the site online, you should be able to upload datasets and data files and see the corresponding files in your S3 bucket. Within a bucket, the folder structure emulates that found in local file storage.

S3 Storage Options
##################

=========================================== ================== ========================================================================== =============
JVM Option Value Description Default value
=========================================== ================== ========================================================================== =============
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
=========================================== ================== ========================================================================== =============
List of S3 Storage Options
##########################

.. table::
:align: left

=========================================== ================== ========================================================================== =============
JVM Option Value Description Default value
=========================================== ================== ========================================================================== =============
dataverse.files.storage-driver-id <id> Enable <id> as the default storage driver. ``file``
dataverse.files.<id>.type ``s3`` **Required** to mark this storage as S3 based. (none)
dataverse.files.<id>.label <?> **Required** label to be shown in the UI for this storage (none)
dataverse.files.<id>.bucket-name <?> The bucket name. See above. (none)
dataverse.files.<id>.download-redirect ``true``/``false`` Enable direct download or proxy through Dataverse. ``false``
dataverse.files.<id>.upload-redirect ``true``/``false`` Enable direct upload of files added to a dataset to the S3 store. ``false``
dataverse.files.<id>.ingestsizelimit <size in bytes> Maximum size of directupload files that should be ingested (none)
dataverse.files.<id>.url-expiration-minutes <?> If direct uploads/downloads: time until links expire. Optional. 60
dataverse.files.<id>.min-part-size <?> Multipart direct uploads will occur for files larger than this. Optional. ``1024**3``
dataverse.files.<id>.custom-endpoint-url <?> Use custom S3 endpoint. Needs URL either with or without protocol. (none)
dataverse.files.<id>.custom-endpoint-region <?> Only used when using custom endpoint. Optional. ``dataverse``
dataverse.files.<id>.profile <?> Allows the use of AWS profiles for storage spanning multiple AWS accounts. (none)
dataverse.files.<id>.proxy-url <?> URL of a proxy protecting the S3 store. Optional. (none)
dataverse.files.<id>.path-style-access ``true``/``false`` Use path style buckets instead of subdomains. Optional. ``false``
dataverse.files.<id>.payload-signing ``true``/``false`` Enable payload signing. Optional ``false``
dataverse.files.<id>.chunked-encoding ``true``/``false`` Disable chunked encoding. Optional ``true``
dataverse.files.<id>.connection-pool-size <?> The maximum number of open connections to the S3 server ``256``
=========================================== ================== ========================================================================== =============

.. table::
:align: left

=========================================== ================== ========================================================================== =============
MicroProfile Config Option Value Description Default value
=========================================== ================== ========================================================================== =============
dataverse.files.<id>.access-key <?> :ref:`Provide static access key ID. Read before use! <s3-mpconfig>` ``""``
dataverse.files.<id>.secret-key <?> :ref:`Provide static secret access key. Read before use! <s3-mpconfig>` ``""``
=========================================== ================== ========================================================================== =============


.. _s3-mpconfig:

Credentials via MicroProfile Config
###################################

Optionally, you may provide static credentials for each S3 storage using MicroProfile Config options:

- ``dataverse.files.<id>.access-key`` for this storages "access key ID"
- ``dataverse.files.<id>.secret-key`` for this storages "secret access key"

You may provide the values for these via any of the
`supported config sources <https://docs.payara.fish/community/docs/documentation/microprofile/config/README.html>`_.

**WARNING:**

*For security, do not use the sources "environment variable" or "system property" (JVM option) in a production context!*
*Rely on password alias, secrets directory or cloud based sources instead!*

**NOTE:**

1. Providing both AWS CLI profile files (as setup in first step) and static keys, credentials from ``~/.aws``
will win over configured keys when valid!
2. A non-empty ``dataverse.files.<id>.profile`` will be ignored when no credentials can be found for this profile name.
Current codebase does not make use of "named profiles" as seen for AWS CLI besides credentials.

Reported Working S3-Compatible Storage
######################################
Expand Down Expand Up @@ -1747,7 +1791,12 @@ Notes:

- For larger file upload sizes, you may need to configure your reverse proxy timeout. If using apache2 (httpd) with Shibboleth, add a timeout to the ProxyPass defined in etc/httpd/conf.d/ssl.conf (which is described in the :doc:`/installation/shibboleth` setup).

:MultipleUploadFilesLimit
+++++++++++++++++++++++++

This setting controls the number of files that can be uploaded through the UI at once. The default is 1000. It should be set to 1 or higher since 0 has no effect. To limit the number of files in a zip file, see ``:ZipUploadFilesLimit``.

``curl -X PUT -d 500 http://localhost:8080/api/admin/settings/:MultipleUploadFilesLimit``

:ZipDownloadLimit
+++++++++++++++++
Expand Down Expand Up @@ -2367,8 +2416,7 @@ Also refer to the "Datafile Integrity" API :ref:`datafile-integrity`
:SendNotificationOnDatasetCreation
++++++++++++++++++++++++++++++++++

A boolean setting that, if true will send an email and notification to users when a Dataset is created. Messages go to those, other than the dataset creator,
who have the ability/permission necessary to publish the dataset. The intent of this functionality is to simplify tracking activity and planning to follow-up contact.
A boolean setting that, if true, will send an email and notification to users when a Dataset is created. Messages go to those, other than the dataset creator, who have the ability/permission necessary to publish the dataset. The intent of this functionality is to simplify tracking activity and planning to follow-up contact.

``curl -X PUT -d true http://localhost:8080/api/admin/settings/:SendNotificationOnDatasetCreation``

Expand Down
Loading

0 comments on commit 368358f

Please sign in to comment.