-
Notifications
You must be signed in to change notification settings - Fork 391
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paging "All Completed Jobs" in UI Fails #107
Comments
Thank you for the bug reports. I think there are two separate issues going on there. The first one sounds like filesystem corruption, which can happen if Cronicle crashes, is uncleanly shut down or the server is rebooted while data was being written to disk. Unfortunately the only real fix here is to delete your historical job data. If you don't want to delete your individual job logs and metadata, you can just delete and recreate the main completed list itself:
The second issue you reported is quite concerning (the ticker stopping), as I have just now run into this same thing on 4 different servers, all CentOS and all Node.js v10. I am actively investigating it, but it is very difficult to track down, because at least in my case it takes 20+ days of continuous running to reproduce it. I'll let you know if/when I resolve this, and I am going to add a large, loud disclaimer to the top of the main repository README, alerting people to this one. This is really bad. Here is more information about the bug:
Investigation ongoing, but I don't have much else to go on at the moment. Thank you for your report, however, because that allows me to rule out AWS / S3 as a culprit. |
I'm trying to delete the logs and I can't get pass this error:
|
Ah crap, well that confirms my suspicion that the data got corrupted. The list header doesn't have the correct number of pages. This is awful, and I am so sorry this happened. For now, just to get you back up and running, you can simply delete the list header (i.e. leave all the pages) and I think it'll be smart enough to start over and ignore the mess left behind:
So instead of I am working on a new "transaction" system that should provide atomic list writes, and automatic rollback on crash, as to prevent things like this in the future. Currently slated for v2.0, but I think in light of this I should bump up the priority on that. |
Doing the "delete" worked perfectly thanks! Regarding the ticker stopping, in my system the problem occurs every 30 days. For now, I have a cron task to recycle the cronicle service. Looking forward to v2.0. |
Thank you, @dropthemic, that timeline actually really helps me. So far I've seen this on 4 of my servers, but all of them had a very high uptime, like 27 days or something. Do you recall if you ever saw this on Node versions prior to 10? I am trying to nail down exactly when this started, and I just recently upgraded to 10, so that is currently a suspect. Thanks again for your time. |
Looking at my yum history, I started with v0.10.48. Sorry never tried prior to version 10. |
Got it, thank you! |
Hey @dropthemic, just FYI this looks like an actual bug in Node.js core! See #108 for details. It's now fixed and landed, but won't be released until Node v10.10 (presumably). PR for fix: nodejs/node#22214 |
Wow this is good news! Thanks for all your help and openness on changes. |
Hey @dropthemic, just letting you know, it looks like the Node.js core timer bug is actually fixed in Node v10.9.0. I verified this myself using libfaketime and simulating 25 days of runtime in a few minutes. I was able to reproduce the bug in 10.5, but not on 10.9. However, you reported Node version 10.9 in this issue report, so I cannot explain how that is possible. But, there are two separate issues reported here, the completed events paging issue (data corruption), and the Node timer bug. Is it possible that you actually haven't experienced the timer bug (Cronicle suddenly stops running jobs) on Node v10.9? Or do I have this all wrong? |
Summary
In the "All Complete Jobs" section of UI, the jobs listed are stale and paging through the records stops after the "3rd" next page.
However, if I go to an individual event and see job history, I can see the history. It could an issue with my file system storage but I'm not sure how to resolve.
I also have an issue where Cronicle stops running tasks with no seg fault or any errors. Looking over my log files, the scheduler tick never advances:
Another example of scheduler not advancing:
Steps to reproduce the problem
N/A
Your Setup
Vanilla Install
Operating system and version?
CentOS 6.10
Node.js version?
v10.9.0
Cronicle software version?
Version 0.8.24
Are you using a multi-server setup, or just a single server?
Single
Are you using the filesystem as back-end storage, or S3/Couchbase?
FS
Can you reproduce the crash consistently?
Yes
Log Excerpts
The text was updated successfully, but these errors were encountered: