-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NEW: Adding first cut of output handling support. #12
Conversation
This adds an `Output` and a `CollectOutputs` class that help gathering outputs generated by a Launcher.
I have finally got around to having a look! I've run the new example and my initial impressions are favorable: having things like the info, the log and the tids available as attributes is certainly very nice. Some comments:
This is a great starting point and I can see the utility of such a system. Here, I'll show how I imagine improving it: >>> output = lancet.Launcher(example_name, integers, factor_cmd,
output_directory='output')()
>>> output = lancet.Output('output') # Equivalent to the return value above
>>> output.update() # Capture the current state of the filesystem
>>> len(output) # Length two because previous run exists.
2
>>> output.paths
['2015-06-22_1356-prime_quintuplet', '2015-06-22_1538-prime_quintuplet']
>>> '2015-06-22_1356-prime_quintuplet' in output
True
# Accessing tid field of named tuple for this run...
>>> output['2015-06-22_1356-prime_quintuplet'].tids
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15]
# The output directory of the most recent run (call it last_run maybe?)
>>> output.last.output_dir
'output/2015-06-22_1356-prime_quintuplet'
# No need to show much else as contents is mutable (filesystem) state
>>> print repr(output)
Output(output_directory='output') I think this design would cover everything you've suggested except >>> output = Output('output', expansions={'full_filename':ShellCommand.LongFilename})
# Additional field added to the named tuple using the expansion.
>>> output.last.long_filename
'/home/user/.../output/2015-06-22_1356-prime_quintuplet' What do you think? I am happy to immediately implement my suggestion if you like this updated proposal. I could commit such a class directly to master but then your contribution won't be properly recorded. Alternatively, maybe I could commit to your branch to update this PR - or submit a PR on your PR?! Of course, you could have a go implementing this yourself, but if you are happy with my suggestions above, I think I could get it implemented and merged with master very quickly. |
Thanks for the quick review. I am happy you like the general idea. Here are my comments:
>>> output.paths
['2015-06-22_1356-prime_quintuplet', '2015-06-22_1538-prime_quintuplet']
>>> '2015-06-22_1356-prime_quintuplet' in output
True
# Accessing tid field of named tuple for this run...
>>> output['2015-06-22_1356-prime_quintuplet'].tids I don't like this at all. I'd rather have the >>> output = lancet.Output('output') # Equivalent to the return value above
>>> output.update() # Capture the current state of the filesystem
>>> len(output)
2
>>> output[0]
OutputData(tids=[...], specs=[...], ...)
>>> for tid, spec in zip(output[0].tids, output[0].specs):
... print tid, spec
>>> output.paths
[...]
>>> output.last # same as output[-1] and can simply be a property.
Long discussions like this on github are painful as I cannot quote your points. Can we do this on email? Or should I just reply to the gh email I receive? |
You can just reply to the e-mail if that is more convenient for you. I'll keep the conversation on GitHub for now and simply make use the markdown quote syntax above.
Yes, adding param is no problem at all.
I think we would need some very clear ideas as to how we want to extend these ideas in future before the introduction of two new classes is really justified...
I think I would consider
I mostly agree and that is why I proposed the
Again I agree with you here ('runs' may also be a better name). The integer indexing approach is definitely a better idea than the strings and would work nicely in conjunction with The only thing to be careful about is the ordering of the runs - in the simplest case this could be based on a simple alphanumerical sort over the run names (this would also be a chronological sort using the default timestamp format). However, I think there is better way: the launch timestamp is explicitly recorded in the info file so it would probably make a lot of sense to use that. This way, you would be certain that Lastly, I will note that the integer scheme is not exclusive with indexing via explicit path name - |
Alright, so technically there are two objects,
Right, so I will try the names and see how this works.
Well but the timestamps of the directory are already sorted according to the timestamp and I think it is safe to assume that the output of one launch corresponds to one problem (or is that a dangerous assumption?) I certainly think it is reasonable and will use that for the time being.
Yes, we should get this as it currently stands and can fix bugs if we see them.
Indeed and we could this also a bit later and just go with integers for now. As regards the expansions, I think there may be a misunderstanding, in my use case In summary I'll make the following changes when I am able:
Have I missed anything? |
Your summary sounds good and I look forward to seeing the updated code! A couple of quick comments:
Other than these minor points, I think this will be a useful class to introduce without requiring much more effort to implement. Great! |
- One `Output` class which contains `LaunchInfo` namedtuples for each particular "launch". - `OutputInfo` has additional attributes for any expansions by default, `timestamp, path tids, specs, stdout, stderr, log, info` and then any extra expansion keys. - Access to each launch info via integers via `__getitem__`. - Ability to iterate over the output object. - Parametrize the class. - Remove `CollectOutputs` class.
I've implemented everything above but want to write some tests. I'll do that tomorrow, feel free to take a look. |
At a glance it looks great! Two immediate comments:
##### object Protocol ####################################################
I'll have a closer look shortly, but I think it is pretty much what I hoped/expected. |
Travis is failing due to some whitespace issue in the doctest (I think). |
from glob import glob | ||
import json | ||
import os | ||
from os.path import isdir, join, splitext |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I would generally prefer just to import os
and fully qualify as appropriate...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Umm, then we should technically import os.path
many many years ago this used to be a problem but maybe I am just getting old. This SO article is not conclusive:
http://stackoverflow.com/questions/2724348/should-i-use-import-os-path-or-import-os
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it comes down to personal taste - personally I've gotten so used to typing os.path
that it has become automatic. My recommendation here is not because it is necessarily the best approach overall but because this is how it was done everywhere else in Lancet. Changing these imports throughout is something I would consider for a separate PR...
I'll look at this tomorrow. I like the protocol lines as it makes code a lot easier to follow for me (and I am used to it) but it isn't consistent with your code so can remove it. I'm happy to merge it into |
if isdir(full_path): | ||
launches.append(self._get_launch_info(full_path)) | ||
launches.sort() | ||
self.launches = launches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably do this instead:
self.launches = sorted(launches)
Makes very little difference but it does save one line...
Ok, I've only made two comments in the code and otherwise I think it is ready to merge (into launch.py) as soon as Travis is happy! I think I would put this just before the Launchers: #===============#
# Output Helper #
#===============#
# <Your code here>
#===========#
# Launchers #
#===========# Thanks for doing this! I'll return a suitable |
OK, I have made the requested changes. I haven't added any tests and am not sure I will have time to add them soon but am happy to do that if needed. |
I've also pushed a couple of simple tests now, if Travis is happy this is now RTM. |
Looks good to me! Marco says he might have a few comments that he will add here later... |
OK, can't those be addressed in a separate PR? |
Ok, I've quickly gone through it with Marco and he is also satisfied. I'll go ahead and merge it now. Thanks again for your contribution! |
Introduced Output class to conveniently collect information across launches
This is mainly for review and is a first implementation. In particular, I am not too convinced about name of the classes.
The PR adds an
Output
and aCollectOutputs
class that help gatheringoutputs generated by a Launcher. With this, the current example in the documentation becomes:
CollectOutputs
is very simple. TheOutput
class has a convenientdo_expansion
method that is convenient when usingLongFilenames
. For example if one were dumping output files using it,one could simply do:
Then the list of output files (in the same sequence as the tasks and specs) can be processed with any Python code.