New Logs don't appear in Historyserver #64

liammac · 2018-01-16T17:09:43Z

When running the Embedded JobHistoryServer it doesn't seem to ever refresh the jobs from GCS. Any jobs that are logged to GCS after it's started never appear until it's stopped and started again.

krisss85 · 2018-01-22T09:52:27Z

Thanks for reporting this. I am looking into that one. For now it is how you described it. You need to reload the logs from the bucket by restart JHS.

krisss85 · 2018-02-28T11:46:08Z

hey @liammac
I checked the MR JHS code and by default it only scans for intermediate done files, then moves these files to done and loads jobs to the JHS cache. This behaviour is broken with the short-lived clusters as the move is done outside the JHS that you spin off on demand with spydra. The classes are private for JHS, so I created a fork with some updates to reinitialize the history periodically, but somehow I don't like this approach as it ties the spydra project with the hadoop code and creating your own fork is unmanageable in the long run. Maybe you can find some inspiration and better solution by looking into that.
However this allows to keep the JHS refreshing the logs, what you wanted.
https://github.com/krisss85/spydra/tree/fix-jhs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New Logs don't appear in Historyserver #64

New Logs don't appear in Historyserver #64

liammac commented Jan 16, 2018

krisss85 commented Jan 22, 2018

krisss85 commented Feb 28, 2018

New Logs don't appear in Historyserver #64

New Logs don't appear in Historyserver #64

Comments

liammac commented Jan 16, 2018

krisss85 commented Jan 22, 2018

krisss85 commented Feb 28, 2018