From 417ae473ac4a897f0b68e8c351ed8da0eaa962e1 Mon Sep 17 00:00:00 2001 From: Oliver Bertuch Date: Tue, 20 Sep 2022 18:22:01 +0200 Subject: [PATCH 1/4] feat(upload): make upload file storage path configurable #6656 As outlined in IQSS#6656, files will be stored in `domaindir/generated/jsp/dataverse` during upload before being moved to our temporary ingest file space at `$dataverse.files.directory/temp`. With this commit, we enable to configure a different place for these kind of generated temporary files by using MPCONFIG variable substitution inside of glassfish-web.xml. Also sorts the content of glassfish-web.xml into order as specified by the XSD. Documentation of the setting is provided. --- doc/sphinx-guides/source/installation/config.rst | 11 +++++++++++ src/main/webapp/WEB-INF/glassfish-web.xml | 8 +++++++- 2 files changed, 18 insertions(+), 1 deletion(-) diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 17d88c8ea31..72edaa0b456 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -1406,6 +1406,17 @@ dataverse.files.directory This is how you configure the path Dataverse uses for temporary files. (File store specific dataverse.files.\.directory options set the permanent data storage locations.) +dataverse.files.uploads ++++++++++++++++++++++++ + +Configure a folder to store the incoming file stream during uploads (before transfering to `${dataverse.files.directory}/temp`). +You can use an absolute path or a relative, which is relative to the application server domain directory. + +Defaults to ``./uploads``, which resolves to ``/usr/local/payara5/glassfish/domains/domain1/uploads`` in a default +installation. + +Can also be set via *MicroProfile Config API* sources, e.g. the environment variable ``DATAVERSE_FILES_UPLOADS``. + dataverse.auth.password-reset-timeout-in-minutes ++++++++++++++++++++++++++++++++++++++++++++++++ diff --git a/src/main/webapp/WEB-INF/glassfish-web.xml b/src/main/webapp/WEB-INF/glassfish-web.xml index ecd3ba15c40..e56d7013abf 100644 --- a/src/main/webapp/WEB-INF/glassfish-web.xml +++ b/src/main/webapp/WEB-INF/glassfish-web.xml @@ -8,9 +8,15 @@ Keep a copy of the generated servlet class' java code. + + - + + From 4a50fca92c95f1ad5f84f98e9a0cd5ed82a88c86 Mon Sep 17 00:00:00 2001 From: Oliver Bertuch Date: Fri, 11 Nov 2022 19:25:55 +0100 Subject: [PATCH 2/4] docs(storage): add some detailed notes about temporary upload file storage #6656 --- .../source/installation/config.rst | 25 +++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index 813fa9b139b..b225594ec3b 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -700,6 +700,26 @@ Once you have configured a trusted remote store, you can point your users to the =========================================== ================== ========================================================================== =================== +.. _temporary-file-storage: + +Temporary Upload File Storage ++++++++++++++++++++++++++++++ + +When uploading files via the API or Web UI, you need to be aware that multiple steps are involved to enable +features like ingest processing, transfer to a permanent storage, checking for duplicates, unzipping etc. + +All of these processes are triggered after finishing transfers over the wire and moving the data into a temporary +(configurable) location on disk at :ref:`${dataverse.files.directory} `\ ``/temp``. + +Before being moved there, + +- JSF Web UI uploads are stored at :ref:`${dataverse.files.uploads} `, defaulting to + ``/usr/local/payara5/glassfish/domains/domain1/uploads`` folder in a standard installation. This place is + configurable and might be set to a separate disk volume, swiped regularly for leftovers. +- API uploads are stored at the system's temporary files location indicated by the Java system property + ``java.io.tmpdir``, defaulting to ``/tmp`` on Linux. If this location is backed by a `tmpfs `_ + on your machine, large file uploads via API will cause RAM and/or swap usage bursts. You might want to point this to + a different location, restrict maximum size of it and monitor for leftovers. .. _Branding Your Installation: @@ -1412,15 +1432,20 @@ Note that it's also possible to use the ``dataverse.fqdn`` as a variable, if you We are absolutely aware that it's confusing to have both ``dataverse.fqdn`` and ``dataverse.siteUrl``. https://github.com/IQSS/dataverse/issues/6636 is about resolving this confusion. +.. _dataverse.files.directory: + dataverse.files.directory +++++++++++++++++++++++++ This is how you configure the path Dataverse uses for temporary files. (File store specific dataverse.files.\.directory options set the permanent data storage locations.) +.. _dataverse.files.uploads: + dataverse.files.uploads +++++++++++++++++++++++ Configure a folder to store the incoming file stream during uploads (before transfering to `${dataverse.files.directory}/temp`). +Please also see :ref:`temporary-file-storage` for more details. You can use an absolute path or a relative, which is relative to the application server domain directory. Defaults to ``./uploads``, which resolves to ``/usr/local/payara5/glassfish/domains/domain1/uploads`` in a default From c8864f38c221c7f6eec2540fde8092c02e3eec19 Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Tue, 22 Nov 2022 14:47:25 -0500 Subject: [PATCH 3/4] replace "leftovers" with "stale uploads" #6656 --- doc/sphinx-guides/source/installation/config.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/doc/sphinx-guides/source/installation/config.rst b/doc/sphinx-guides/source/installation/config.rst index b225594ec3b..beb63f17bfd 100644 --- a/doc/sphinx-guides/source/installation/config.rst +++ b/doc/sphinx-guides/source/installation/config.rst @@ -715,11 +715,11 @@ Before being moved there, - JSF Web UI uploads are stored at :ref:`${dataverse.files.uploads} `, defaulting to ``/usr/local/payara5/glassfish/domains/domain1/uploads`` folder in a standard installation. This place is - configurable and might be set to a separate disk volume, swiped regularly for leftovers. + configurable and might be set to a separate disk volume where stale uploads are purged periodically. - API uploads are stored at the system's temporary files location indicated by the Java system property ``java.io.tmpdir``, defaulting to ``/tmp`` on Linux. If this location is backed by a `tmpfs `_ on your machine, large file uploads via API will cause RAM and/or swap usage bursts. You might want to point this to - a different location, restrict maximum size of it and monitor for leftovers. + a different location, restrict maximum size of it, and monitor for stale uploads. .. _Branding Your Installation: From afdae3e533300ad574be747c997ee0c25f1e3a3d Mon Sep 17 00:00:00 2001 From: Philip Durbin Date: Tue, 22 Nov 2022 14:48:43 -0500 Subject: [PATCH 4/4] add release note #6656 --- doc/release-notes/6656-file-uploads.md | 1 + 1 file changed, 1 insertion(+) create mode 100644 doc/release-notes/6656-file-uploads.md diff --git a/doc/release-notes/6656-file-uploads.md b/doc/release-notes/6656-file-uploads.md new file mode 100644 index 00000000000..a2430a5d0a8 --- /dev/null +++ b/doc/release-notes/6656-file-uploads.md @@ -0,0 +1 @@ +new JVM option: dataverse.files.uploads