7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels #7053

JingMa87 · 2020-07-04T03:19:33Z

What this PR does / why we need it: Fixes infinite IN PROGRESS and DELETE IN PROGRESS labels

Which issue(s) this PR closes: 7052

Special notes for your reviewer: People will be happy about this

Suggestions on how to test this: Run a harvesting client and delete another one at the same time, then restart Glassfish immediately when you see the IN PROGRESS and DELETE IN PROGRESS labels. After you're done restarting you'll see the RUN FAILED AND DELETE FAILED labels instead of the old ones and you can edit the client again. So you can rerun or delete like normally.

Does this PR introduce a user interface change? If mockups are available, please link/include them here: Yes

Is there a release notes update needed for this change?: Yes, it's a noticeable change.

Additional documentation: N/A

coveralls · 2020-07-04T03:27:06Z

Coverage decreased (-0.006%) to 19.647% when pulling e017f6c on JingMa87:7052-fix-hanging-harvesting-run into 6daf219 on IQSS:develop.

pdurbin

I'm not in a great position to review this code (having not worked on harvesting) but from a quick look it seems ok. It would be nice for @landreev or another developer who has worked on harvesting to take a look.

pdurbin · 2020-07-09T15:25:32Z

src/main/java/edu/harvard/iq/dataverse/harvest/client/ClientHarvestRun.java

@@ -5,16 +5,9 @@
 */
 package edu.harvard.iq.dataverse.harvest.client;

+import javax.persistence.*;


I'm surprised it isn't already written at http://guides.dataverse.org/en/4.20/developers/coding-style.html but we tend not to use wildcard imports like this.

pdurbin · 2020-07-09T15:28:17Z

src/main/java/edu/harvard/iq/dataverse/harvest/client/ResetHarvestInProgressBean.java

+import javax.ejb.Singleton;
+import javax.ejb.Startup;
+
+@Singleton


Maybe I'm just being paranoid but we've had trouble with Singleton beans recently. See IQSS/dataverse.harvard.edu#73

I read the issue you linked very closely since the last thing I want to do is introduce new errors. In the issue there's an authentication bean that's a singleton and when the bean gets held up by a process, no other clients can use it to authenticate because there's only one instance. This is bad. However, in my use case the singleton is only used during the startup of Dataverse to reset any hanging harvesting clients. After that there's no use for it, so a singleton bean seems the right approach here. Moreover, in Dataverse there's many singleton beans with an additional @Startup tag to initialize a certain process and those don't seem to be causing issues either. So I'm fairly confident that a singleton won't cause issues here.

I'm not worried about it being a singleton - seeing how it has nothing in it but the @PostConstruct code.
But there's another reason why this is not going to work, specifically for us here at Harvard. We have 2 production servers. So if one of the two is being restarted, it'll reset the flags even though the actions in question may still be in progress on the other node.

It's easy to address for the most part: in a multi-server situation only one node is designated to do harvesting, and other timer-run jobs. This "master node" is the one that has the JVM option -Ddataverse.timerServer=true. And you can use if (systemConfig.isTimerServer()) to check if this node is the timer node in your code, before proceeding to clear the flags.

HOWEVER, there is an exception to the rule - when a harvesting run is started not by the timer, but manually, through the harvesting client page and/or API, the job can end up running on either of the multiple nodes.

Admittedly, this will not affect many other dataverse installations. (I'm actually not aware of another production installation at this point that is using a multiple server setup). But this would affect us, so I am a bit hesitant about doing this.
And I'm not really positive as to what the best reliable solution is. That ClientHarvestRun object would need an extra entry identifying the node actually running the harvest, maybe?

Another solution would be to change how the manually-run harvests are handled. I.e., instead of actually running the harvest it could create a one-time timer entry for 5 sec. from now. So it will then be picked up by the timer service on the master node - and we are guaranteed to have all the harvesting runs staying on one server.

I'm not 100% sure if being able to have these harvest-related flags cleared automatically is worth adding any complicated code to address the multiple server scenario...
So yet another, much simpler, solution would be to implement it as above - but with an option to disable this mechanism; so on startup the service above would check the setting before proceeding - if (enableClearingHarvestJobsOnStartup) { ... }. It will be enabled by default; so it would work for most installations. But then in the guide it will say "make sure to disable this mechanism if you are running multiple servers!". Somewhere in the same part of the guide where we mention that dataverse.timerServersetting.

@landreev I think it's best to take the simplest approach and only build in support for single server scenario's. My plan is to add a sane default of "true" for a configurable DB key "ClearingHarvestJobsOnStartupEnabled" using the SettingsServiceBean, do you have any objections?

landreev · 2020-07-21T17:21:58Z

src/main/java/edu/harvard/iq/dataverse/harvest/client/ClientHarvestRun.java

@@ -40,12 +40,13 @@ public void setId(Long id) {
        this.id = id;
    }

-    public enum RunResultType { SUCCESS, FAILURE, INPROGRESS };
+    public enum RunResultType { SUCCESS, RUN_FAILED, RUN_IN_PROGRESS, DELETE_FAILED };


Not a big deal, but what is the value of adding this status condition for a failure to delete a client? It has some value for a dataverse admin, to know that the last attempt to harvest from a certain server resulted in a failure. Is it really useful to know that an attempt to delete a client failed? - should the startup check simply remove the "delete in progress" flag quietly instead?
After all, seeing how the client is still there makes it somewhat clear that the attempt to get rid of it didn't work out, so they should try again?

I may be missing some situation where it could actually be useful - so I'm open to hearing it.

@landreev I guess handling it quietly would be ok. So instead of DELETE FAILED or FAILED we just show a success label (those are the only options)?

pdurbin · 2023-01-03T19:53:23Z

I'm not sure of the status of this PR. @JingMa87 recently explained that he switched to another job: #7412 (comment)

@janvanmansum has been helping explain which issues and PRs are a priority for DANS.

For this one, @landreev recently added the ability to stop a harvest in progress...

7940 stop harvest in progress #9187

... which seems related, at least. So I'm not sure if we need this PR or if it should be rewritten (the branch is gone from GitHub, by the way).

pdurbin · 2024-02-28T20:59:08Z

@JingMa87 I'm closing this PR but if you're still interested in it, please reply! Thanks!

7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels

e743d05

pdurbin reviewed Jul 9, 2020

View reviewed changes

7052 Change back to full import statements.

e017f6c

landreev reviewed Jul 21, 2020

View reviewed changes

pdurbin assigned JingMa87 Jul 21, 2020

pdurbin mentioned this pull request Apr 13, 2022

Spike: Inventory and prioritize all existing Harvesting related issues IQSS/dataverse-pm#24

Closed

3 tasks

mreekie added the bk2211 label Nov 1, 2022

mreekie removed the bk2211 label Jan 11, 2023

pdurbin added the Size: 3 A percentage of a sprint. 2.1 hours. label Feb 28, 2024

pdurbin closed this Feb 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels #7053

7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels #7053

JingMa87 commented Jul 4, 2020

coveralls commented Jul 4, 2020 •

edited

Loading

pdurbin left a comment

pdurbin Jul 9, 2020

pdurbin Jul 9, 2020

JingMa87 Jul 13, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

landreev Jul 21, 2020

JingMa87 Aug 4, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

JingMa87 Aug 4, 2020 •

edited

Loading

pdurbin commented Jan 3, 2023

pdurbin commented Feb 28, 2024

7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels #7053

7052 Fix infinite IN PROGRESS and DELETE IN PROGRESS labels #7053

Conversation

JingMa87 commented Jul 4, 2020

coveralls commented Jul 4, 2020 • edited Loading

pdurbin left a comment

Choose a reason for hiding this comment

pdurbin Jul 9, 2020

Choose a reason for hiding this comment

pdurbin Jul 9, 2020

Choose a reason for hiding this comment

JingMa87 Jul 13, 2020 • edited Loading

Choose a reason for hiding this comment

landreev Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

landreev Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

landreev Jul 21, 2020

Choose a reason for hiding this comment

JingMa87 Aug 4, 2020 • edited Loading

Choose a reason for hiding this comment

landreev Jul 21, 2020 • edited Loading

Choose a reason for hiding this comment

JingMa87 Aug 4, 2020 • edited Loading

Choose a reason for hiding this comment

pdurbin commented Jan 3, 2023

pdurbin commented Feb 28, 2024

coveralls commented Jul 4, 2020 •

edited

Loading

JingMa87 Jul 13, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

JingMa87 Aug 4, 2020 •

edited

Loading

landreev Jul 21, 2020 •

edited

Loading

JingMa87 Aug 4, 2020 •

edited

Loading