Skip to content
This repository was archived by the owner on Mar 31, 2022. It is now read-only.

New Logs don't appear in Historyserver #64

Open
liammac opened this issue Jan 16, 2018 · 2 comments
Open

New Logs don't appear in Historyserver #64

liammac opened this issue Jan 16, 2018 · 2 comments

Comments

@liammac
Copy link

liammac commented Jan 16, 2018

When running the Embedded JobHistoryServer it doesn't seem to ever refresh the jobs from GCS. Any jobs that are logged to GCS after it's started never appear until it's stopped and started again.

@krisss85
Copy link
Contributor

Thanks for reporting this. I am looking into that one. For now it is how you described it. You need to reload the logs from the bucket by restart JHS.

@krisss85
Copy link
Contributor

hey @liammac
I checked the MR JHS code and by default it only scans for intermediate done files, then moves these files to done and loads jobs to the JHS cache. This behaviour is broken with the short-lived clusters as the move is done outside the JHS that you spin off on demand with spydra. The classes are private for JHS, so I created a fork with some updates to reinitialize the history periodically, but somehow I don't like this approach as it ties the spydra project with the hadoop code and creating your own fork is unmanageable in the long run. Maybe you can find some inspiration and better solution by looking into that.
However this allows to keep the JHS refreshing the logs, what you wanted.
https://github.com/krisss85/spydra/tree/fix-jhs

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants