You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Mar 31, 2022. It is now read-only.
When running the Embedded JobHistoryServer it doesn't seem to ever refresh the jobs from GCS. Any jobs that are logged to GCS after it's started never appear until it's stopped and started again.
The text was updated successfully, but these errors were encountered:
hey @liammac
I checked the MR JHS code and by default it only scans for intermediate done files, then moves these files to done and loads jobs to the JHS cache. This behaviour is broken with the short-lived clusters as the move is done outside the JHS that you spin off on demand with spydra. The classes are private for JHS, so I created a fork with some updates to reinitialize the history periodically, but somehow I don't like this approach as it ties the spydra project with the hadoop code and creating your own fork is unmanageable in the long run. Maybe you can find some inspiration and better solution by looking into that.
However this allows to keep the JHS refreshing the logs, what you wanted. https://github.com/krisss85/spydra/tree/fix-jhs
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
When running the Embedded JobHistoryServer it doesn't seem to ever refresh the jobs from GCS. Any jobs that are logged to GCS after it's started never appear until it's stopped and started again.
The text was updated successfully, but these errors were encountered: