IQSS · poikilotherm · Dec 5, 2018 · Dec 5, 2018 · Dec 5, 2018 · Dec 17, 2018
diff --git a/doc/sphinx-guides/source/admin/harvestclients.rst b/doc/sphinx-guides/source/admin/harvestclients.rst
@@ -22,6 +22,13 @@ Clients are managed on the "Harvesting Clients" page accessible via the :doc:`da
 
 The process of creating a new, or editing an existing client, is largely self-explanatory. It is split into logical steps, in a way that allows the user to go back and correct the entries made earlier. The process is interactive and guidance text is provided. For example, the user is required to enter the URL of the remote OAI server. When they click *Next*, the application will try to establish a connection to the server in order to verify that it is working, and to obtain the information about the sets of metadata records and the metadata formats it supports. The choices offered to the user on the next page will be based on this extra information. If the application fails to establish a connection to the remote archive at the address specified, or if an invalid response is received, the user is given an opportunity to check and correct the URL they entered. 
 
+Known issues
+~~~~~~~~~~~~
+When running harvest clients, you should validate from the logs if all of your harvesters complete their job.
+"Trouble" and incomplete harvests might await you, when your harvests take longer than one hour or stack up when grouped
+at start times lying just an hour or two away from each other. If you suffer from this, please open an issue referencing
+the :doc:`../developers/timers` part of the docs.
+
 New in Dataverse 4, vs. DVN 3
 -----------------------------
 

diff --git a/doc/sphinx-guides/source/admin/harvestserver.rst b/doc/sphinx-guides/source/admin/harvestserver.rst
@@ -14,6 +14,14 @@ harvesting protocol. Note that the terms "Harvesting Server" and "OAI
 Server" are being used interchangeably throughout this guide and in
 the inline help text.
 
+If you want to learn more about OAI-PMH, you could take a look at
+`DataCite OAI-PMH guide <https://support.datacite.org/docs/datacite-oai-pmh>`_
+or the `OAI-PMH protocol definition <https://www.openarchives.org/OAI/openarchivesprotocol.html>`_.
+
+You might consider adding your OAI-enabled production instance of Dataverse to
+`this shared list <https://docs.google.com/spreadsheets/d/12cxymvXCqP_kCsLKXQD32go79HBWZ1vU_tdG4kvP5S8/>`_
+of such instances.
+
 How does it work? 
 -----------------
 
@@ -28,6 +36,10 @@ Harvesting server can be enabled or disabled on the "Harvesting
 Server" page accessible via the :doc:`dashboard`. Harvesting server is by
 default disabled on a brand new, "out of the box" Dataverse.
 
+The OAI-PMH endpoint can be accessed at ``http(s)://<Your Dataverse FQDN>/oai``.
+If you want other services to harvest your repository, point them to this URL.
+*Example URL to 'Identify' verb*: `Harvard Dataverse OAI <https://dataverse.harvard.edu/oai?verb=Identify>`_
+
 OAI Sets
 --------
 
@@ -124,7 +136,8 @@ runs every night (at 2AM, by default). This export timer is created
 and activated automatically every time the application is deployed
 or restarted. Once again, this is new in Dataverse 4, and unlike DVN
 v3, where export jobs had to be scheduled and activated by the admin
-user. See the "Export" section of the Admin guide, for more information on the automated metadata exports.
+user. See the :doc:`/admin/metadataexport` section of the Admin guide,
+for more information on the automated metadata exports.
 
 It is still possible however to make changes like this be immediately
 reflected in the OAI server, by going to the *Harvesting Server* page

diff --git a/doc/sphinx-guides/source/admin/metadataexport.rst b/doc/sphinx-guides/source/admin/metadataexport.rst
@@ -7,14 +7,36 @@ Metadata Export
 Automatic Exports
 -----------------
 
-Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously. Once completed, it will make the dataset metadata exported and cached in all the supported formats:
+Publishing a dataset automatically starts a metadata export job, that will run in the background, asynchronously.
+Once completed, it will make the dataset metadata exported and cached in all the supported formats:
 
 - Dublin Core
 - Data Documentation Initiative (DDI)
 - Schema.org JSON-LD
 - native JSON (Dataverse-specific)
 
-A scheduled timer job that runs nightly will attempt to export any published datasets that for whatever reason haven't been exported yet. This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or configure it manually. (See the "Application Timers" section of this guide for more information)
+Scheduled Timer Export
+----------------------
+
+A scheduled timer job that runs nightly will attempt to export any published datasets in all supported metadata formats
+that for whatever reason haven't been exported yet and cache the results on the filesystem.
+
+**Note** that normally an export will happen automatically whenever a dataset is published. This scheduled job is there
+to catch any datasets for which that export did not succeed, for one reason or another. Also, since this functionality
+has been added in version 4.5: if you are upgrading from a previous version, none of your datasets are exported yet.
+
+This daily job will also update all the harvestable OAI sets configured on your server, adding new and/or newly
+published datasets or marking deaccessioned datasets as "deleted" in the corresponding sets as needed.
+
+This timer is activated automatically on the deployment, or restart, of the application. So, again, no need to start or
+configure it manually. (See alse :doc:`timers` section of this guide for more information about timer usage in Dataverse.)
+There is no admin user-accessible configuration for this timer.
+
+This job is automatically scheduled to run at 2AM local time every night.
+
+Before Dataverse 4.10 it is possible (for an advanced and adventureous user) to change that time by directly editing
+the EJB timer application table in the database. From 4.10 onward, timers are not persisted any longer. If you have
+a desperate need for a configurable time, please open an issue on GitHub, describing your use case.
 
 Batch exports through the API 
 -----------------------------

diff --git a/doc/sphinx-guides/source/admin/timers.rst b/doc/sphinx-guides/source/admin/timers.rst
@@ -3,50 +3,66 @@
 Dataverse Application Timers
 ============================
 
-Dataverse uses timers to automatically run scheduled Harvest and Metadata export jobs. 
+Dataverse uses timers to automatically run scheduled jobs for:
 
-.. contents:: |toctitle|
-	:local:
-
-Dedicated timer server in a Dataverse server cluster
-----------------------------------------------------
-
-When running a Dataverse cluster - i.e. multiple Dataverse application
-servers talking to the same database - **only one** of them must act
-as the *dedicated timer server*. This is to avoid starting conflicting
-batch jobs on multiple nodes at the same time.
+* Harvesting metadata
+   * See :doc:`/admin/harvestserver` and :doc:`/admin/harvestclients`
+   * Created only when scheduling enabled by admin (via "Manage Harvesting Clients" page) and canceled when disabled.
+* :doc:`/admin/metadataexport`
+   * Enabled by default, non configurable.
 
-This does not affect a single-server installation. So you can safely skip this section unless you are running a multi-server cluster. 
+All timers are created on application startup and are not configurable when to go off. Since Dataverse 4.10 they are not
+persisted to a database, as they had been deleted and re-created on every startup before.
 
-The following JVM option instructs the application to act as the dedicated timer server: 
+.. contents:: |toctitle|
+	:local:
 
-``-Ddataverse.timerServer=true``
+Dataverse server clusters and EJB timers
+----------------------------------------
 
-**IMPORTANT:** Note that this option is automatically set by the Dataverse installer script. That means that when **configuring a multi-server cluster**, it will be the responsibility of the installer to remove the option from the :fixedwidthplain:`domain.xml` of every node except the one intended to be the timer server. We also recommend that the following entry in the :fixedwidthplain:`domain.xml`: ``<ejb-timer-service timer-datasource="jdbc/VDCNetDS">`` is changed back to ``<ejb-timer-service>`` on all the non-timer server nodes. Similarly, this option is automatically set by the installer script. Changing it back to the default setting on a server that doesn't need to run the timer will prevent a potential race condition, where multiple servers try to get a lock on the timer database. 
+In a multi-node cluster, all timers will be created on a dedicated timer node (see below). This is not necessarily on the
+node where configuration of harvesting clients or metadata export has been done by an admin.
 
-**Note** that for the timer to work, the version of the PostgreSQL JDBC driver your instance is using must match the version of your PostgreSQL database. See the 'Timer not working' section of the :doc:`/admin/troubleshooting` guide.
+Dedicated timer server node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Harvesting Timers 
------------------
+When running a "cluster" with multiple instances of Dataverse connected to the same database, **only one** of them must
+act as the *dedicated timer server*. This is to avoid starting conflicting batch jobs on multiple nodes at the same time.
+(Might get addressed for automation in a later Dataverse version using cluster support from the application server.)
 
-These timers are created when scheduled harvesting is enabled by a local admin user (via the "Manage Harvesting Clients" page). 
+This does not affect a single-server installation. So you can safely skip this section unless you are running a multi-server cluster. 
 
-In a multi-node cluster, all these timers will be created on the dedicated timer node (and not necessarily on the node where the harvesting clients were created and/or saved). 
+The following system property instructs the application to act as the dedicated timer server:
 
-A timer will be automatically removed when a harvesting client with an active schedule is deleted, or if the schedule is turned off for an existing client. 
+``dataverse.timerServer=true``
 
-Metadata Export Timer
----------------------
+**Note** that when using JVM options to set system properties, please use ``-Ddataverse.timerServer=true``. You should
+prefer using ``asadmin`` system properties commands.
 
-This timer is created automatically whenever the application is deployed or restarted. There is no admin user-accessible configuration for this timer. 
+**IMPORTANT:** This is automatically set by the Dataverse installer script on every node.
 
-This timer runs a daily job that tries to export all the local, published datasets that haven't been exported yet, in all supported metadata formats, and cache the results on the filesystem. (Note that normally an export will happen automatically whenever a dataset is published. This scheduled job is there to catch any datasets for which that export did not succeed, for one reason or another). Also, since this functionality has been added in version 4.5: if you are upgrading from a previous version, none of your datasets are exported yet. So the first time this job runs, it will attempt to export them all. 
+That means that *when configuring a multi-server cluster*, it will be the responsibility of the sysadmin to remove
+the option from every node except the one intended to be the timer server. Easiest way to achieve this is by running
+``asadmin delete-system-property "dataverse.timerServer"``.
+(This option will not be set to ``true`` in future Docker images of Dataverse, it needs to be configured.)
 
-This daily job will also update all the harvestable OAI sets configured on your server, adding new and/or newly published datasets or marking deaccessioned datasets as "deleted" in the corresponding sets as needed. 
+As we don't use persistent timers from Dataverse 4.10 onward, when upgrading, it is up to you to follow the former
+recommendation or not. In new installations, this will not be necessary.
 
-This job is automatically scheduled to run at 2AM local time every night. If really necessary, it is possible (for an advanced user) to change that time by directly editing the EJB timer application table in the database.  
+  We also recommend that the following entry in the :fixedwidthplain:`domain.xml`:
+  ``<ejb-timer-service timer-datasource="jdbc/VDCNetDS">`` is changed back to ``<ejb-timer-service>``
+  on all the non-timer server nodes. Similarly, this option is automatically set by the installer script.
+  Changing it back to the default setting on a server that doesn't need to run the timer will prevent a potential
+  race condition, where multiple servers try to get a lock on the timer database.
 
 Known Issues
 ------------
 
-We've received several reports of an intermittent issue where the application fails to deploy with the error message "EJB Timer Service is not available." Please see the :doc:`/admin/troubleshooting` section of this guide for a workaround. 
+Prior to Dataverse 4.10, we've received several reports of an intermittent issue where the application fails to deploy
+with the error message "EJB Timer Service is not available." Please see the :doc:`/admin/troubleshooting` section of
+this guide for a workaround.
+
+When running harvest clients, you should validate from the logs if all of your harvesters complete their job. "Trouble"
+and incomplete harvests might await you, when your harvests take longer than one hour or stack up when grouped at start
+times lying just an hour or two away from each other. If you suffer from this, please open an issue referencing the
+:doc:`../developers/timers` part of the docs.
diff --git a/doc/sphinx-guides/source/developers/big-data-support.rst b/doc/sphinx-guides/source/developers/big-data-support.rst
@@ -396,3 +396,7 @@ Available variables are:
 * ``minorVersion``
 * ``majorVersion``
 * ``releaseStatus``
+
+----
+
+Previous: :doc:`selinux` | Next: :doc:`timers`
diff --git a/doc/sphinx-guides/source/developers/index.rst b/doc/sphinx-guides/source/developers/index.rst
@@ -31,3 +31,4 @@ Developer Guide
    geospatial
    selinux
    big-data-support
+   timers
diff --git a/doc/sphinx-guides/source/developers/timers.rst b/doc/sphinx-guides/source/developers/timers.rst
@@ -0,0 +1,24 @@
+==========
+EJB Timers
+==========
+
+As described in :doc:`../admin/timers`, Dataverse uses EJB timers for scheduled jobs. This section is about the
+techniques used for scheduling.
+
+* :doc:`../admin/metadataexport` is done via ``@Schedule`` annotation on ``OAISetServiceBean.exportAllSets()`` and
+  ``DatasetServiceBean.exportAll()``. Fixed to 2AM local time every day, non persistent.
+* Harvesting is a bit more complicated. The timer is attached to ``HarvesterServiceBean.harvestEnabled()`` via
+  ``@Schedule`` annotation every hour, non-persistent.
+  That method collects all enabled ``HarvestingClient`` and runs them if time from client config matches.
+
+**NOTE:** the timers for Harvesting might cause trouble, when harvesting takes longer than one hour or multiple
+harvests configured for the same starting hour stack up. There is a lock in place to prevent "bad things", but that
+might result in lost harvest. If this really causes trouble in the future, the code should be refactored to use either
+a proper task scheduler, JBatch API or asynchronous execution. A *TODO* message has been left in the code.
+
+.. contents:: |toctitle|
+	:local:
+
+----
+
+Previous: :doc:`big-data-support`
diff --git a/scripts/installer/glassfish-setup.sh b/scripts/installer/glassfish-setup.sh
@@ -122,9 +122,8 @@ function final_setup(){
         ./asadmin $ASADMIN_OPTS create-jdbc-resource --connectionpoolid dvnDbPool jdbc/VDCNetDS
 
         ###
-        # Set up the data source for the timers
-
-        ./asadmin $ASADMIN_OPTS set configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/VDCNetDS
+        # Obsolete since merge of GH-5345, using only non-persistent timers from now on.
+        #./asadmin $ASADMIN_OPTS set configs.config.server-config.ejb-container.ejb-timer-service.timer-datasource=jdbc/VDCNetDS
 
         ./asadmin $ASADMIN_OPTS create-jvm-options "\-Djavax.xml.parsers.SAXParserFactory=com.sun.org.apache.xerces.internal.jaxp.SAXParserFactoryImpl"
 

diff --git a/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java b/src/main/java/edu/harvard/iq/dataverse/DatasetServiceBean.java
@@ -31,12 +31,7 @@
 import java.util.logging.FileHandler;
 import java.util.logging.Level;
 import java.util.logging.Logger;
-import javax.ejb.Asynchronous;
-import javax.ejb.EJB;
-import javax.ejb.EJBException;
-import javax.ejb.Stateless;
-import javax.ejb.TransactionAttribute;
-import javax.ejb.TransactionAttributeType;
+import javax.ejb.*;
 import javax.inject.Named;
 import javax.persistence.EntityManager;
 import javax.persistence.NoResultException;
@@ -576,10 +571,33 @@ public void exportAllAsync() {
         exportAllDatasets(false);
     }
 
+    /**
+     * Scheduled function triggering the export of all local & published datasets,
+     * but only on the node which is configured as master timer.
+     *
+     * TODO: this is not unit testable as long as dependent functions aren't.
+     */
+    @Lock(LockType.READ)
+    @Schedule(hour = "2", persistent = false)
     public void exportAll() {
-        exportAllDatasets(false);
+        if (systemConfig.isTimerServer()) {
+            logger.info("DatasetService: Running a scheduled export job.");
+            exportAllDatasets(false);
+        }
     }
 
+    /**
+     * TODO: this code needs refactoring to be unit testable:
+     *       1) Move the Logger/FileHandler stuff to a factory in a Service
+     *          (Export or Logging service) a) to make it mockable and
+     *          b) to have common, reusable code.
+     *       2) Move this to OAIRecordServiceBean. The additional pieces for a
+     *          complete OAI export is in OAISetServiceBean, so it makes more
+     *          sense to live there and use this service as a service.
+     *       3) Moving this to OAIRecordServiceBean makes findAllLocalDatasetIds(), etc
+     *          mockable, so this class (DatasetServiceBean) does not need immediate action.
+     * @param forceReExport
+     */
     public void exportAllDatasets(boolean forceReExport) {
         Integer countAll = 0;
         Integer countSuccess = 0;

diff --git a/src/main/java/edu/harvard/iq/dataverse/HarvestingClientsPage.java b/src/main/java/edu/harvard/iq/dataverse/HarvestingClientsPage.java
@@ -9,14 +9,12 @@
 import edu.harvard.iq.dataverse.engine.command.DataverseRequest;
 import edu.harvard.iq.dataverse.engine.command.exception.CommandException;
 import edu.harvard.iq.dataverse.engine.command.impl.CreateHarvestingClientCommand;
-import edu.harvard.iq.dataverse.engine.command.impl.DeleteHarvestingClientCommand;
 import edu.harvard.iq.dataverse.engine.command.impl.UpdateHarvestingClientCommand;
 import edu.harvard.iq.dataverse.harvest.client.HarvesterServiceBean;
 import edu.harvard.iq.dataverse.harvest.client.HarvestingClient;
 import edu.harvard.iq.dataverse.harvest.client.HarvestingClientServiceBean;
 import edu.harvard.iq.dataverse.harvest.client.oai.OaiHandler;
 import edu.harvard.iq.dataverse.search.IndexServiceBean;
-import edu.harvard.iq.dataverse.timer.DataverseTimerServiceBean;
 import edu.harvard.iq.dataverse.util.BundleUtil;
 import edu.harvard.iq.dataverse.util.JsfHelper;
 import static edu.harvard.iq.dataverse.util.JsfHelper.JH;
@@ -65,8 +63,6 @@ public class HarvestingClientsPage implements java.io.Serializable {
     IndexServiceBean indexService;
     @EJB
     EjbDataverseEngine engineService;
-    @EJB
-    DataverseTimerServiceBean dataverseTimerService;
     @Inject
     DataverseRequestServiceBean dvRequestService;
     @Inject
@@ -453,9 +449,6 @@ public void saveClient(ActionEvent ae) {
 
             configuredHarvestingClients = harvestingClientService.getAllHarvestingClients();
 
-            if (!harvestingClient.isScheduled()) {
-                dataverseTimerService.removeHarvestTimer(harvestingClient);
-            }
             JsfHelper.addSuccessMessage(BundleUtil.getStringFromBundle("harvest.update.success") + harvestingClient.getName());
 
         } catch (CommandException ex) {
-Original file line number
+Diff line change
@@ Expand Up / @@ -31,3 +31,4 @@ Developer Guide @@
        geospatial
        selinux
        big-data-support
+       timers