Skip to content
This repository has been archived by the owner on Feb 3, 2021. It is now read-only.

Feature: Add --apps option to get job command to get all job info in one go #404

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

emlyn
Copy link
Contributor

@emlyn emlyn commented Feb 15, 2018

After submitting a job, I often find myself juggling several commands to get all the info about it (get, list-apps, get-app, get-app-logs), so I put together one command that gets all this info in one go.

Is this something you think makes sense?

Displaying logs is optional, controlled by a --logs flag, as they can get quite big.

I reworked utils.py a bit so that application-summary/print-applications always take a dict of job name to Job/None, as I think that makes the code more consistent and easier to follow.

I'm not sure about the naming, feel free to suggest better names.

Currently the "No Spark applications will be scheduled until the master is selected." warning is displayed twice in the output. Should probably suppress one of those.

@paselem
Copy link
Contributor

paselem commented Feb 16, 2018

Hi @emlyn - overall this change makes a lot of sense to me. The only things that I'm not 100% sold on yet are how you're printing off the logs and the name of the command. If I have, say, 10 apps then the logs for them will be pretty unreadable if they are all just dumped to the screen. I wonder if there may be another way to approach this like write files to disk or something. I suppose the user could always pipe the output too, but searching within it is probably going to be a pain.

Secondly, regarding the naming of the cli command - 'get-all' may be a bit confusing since it could be interpreted many ways - get all what? Jobs? Applications? All information for everything? (and so on...).

If we can address the two items above I think we should consider including this in one of our next releases.

Thanks!

@jafreck
Copy link
Member

jafreck commented Feb 16, 2018

I think a good possibility would be to move this functionality to a --verbose or --all-logs flag for aztk spark job get instead of having it under it's own cli endpoint.

There's an open issue about allowing users to choose an output location for log files with a flag, #322. If there are multiple log files as here, the output would probably be a directory with a log file for each application.

@emlyn
Copy link
Contributor Author

emlyn commented Feb 17, 2018

I agree about the name; I knocked it together quickly and couldn't think of anything better at the time. Maybe a --verbose flag on aztk job get would make more sense.

For me, displaying the logs was useful while debugging, as I only had one app per job, but I agree that with more apps it could easily get confusing.

A way to write the log files to disk would probably make more sense, as even with a single app my log file often reaches several MB in size. And I like the idea of writing everything into a directory so that you don't have to specify the app name(s) on the command line.

I think I'll rework this into a --verbose option on aztk job get, and leave out the log file printing for now.

@paselem
Copy link
Contributor

paselem commented Feb 19, 2018

Thanks @emlyn! Let us know once you've got the next PR ready to view

@jafreck do we have an issue to track viewing/downloading logs to a directory?

@jafreck jafreck changed the title Add get-all command to get all job info in one go Feature: Add get-all command to get all job info in one go Feb 21, 2018
@emlyn emlyn force-pushed the get-job-verbose branch 2 times, most recently from 46fc952 to c91806e Compare February 27, 2018 11:45
@emlyn
Copy link
Contributor Author

emlyn commented Feb 27, 2018

I've rebased this on the latest master, so I think it should be ready to merge if you're happy with it.

I haven't had much time to look into downloading logs yet, but there are a couple of things I'm not too sure about that:

  • Should it go under aztk spark cluster or aztk spark job (or both)?
  • Where are the various different logs stored? Should it ssh into the cluster and get the log files from there, or get them from blob storage?

@emlyn emlyn changed the title Feature: Add get-all command to get all job info in one go Feature: Add --apps option to get job command to get all job info in one go Feb 27, 2018
@emlyn emlyn force-pushed the get-job-verbose branch 2 times, most recently from 54e915d to 9091d31 Compare March 28, 2018 09:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants