-
-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add MLflow set tags function #139
Conversation
Codecov Report
@@ Coverage Diff @@
## master #139 +/- ##
==========================================
- Coverage 67.50% 67.46% -0.05%
==========================================
Files 93 93
Lines 5813 5818 +5
Branches 743 745 +2
==========================================
+ Hits 3924 3925 +1
- Misses 1677 1682 +5
+ Partials 212 211 -1
Continue to review full report at Codecov.
|
src/pykeen/pipeline.py
Outdated
@@ -834,9 +834,11 @@ def pipeline( # noqa: C901 | |||
if not metadata: | |||
metadata = {} | |||
title = metadata.get('title') | |||
tags = metadata.get('run_tags', None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would make more sense that this result tracker-specific information is passed to the pipeline via the result_tracker_kwargs
instead of the metadata
.
Line 832 in 2c9101c
result_tracker = result_tracker_cls(**(result_tracker_kwargs or {})) |
Tags are only relevant for MLflow, so it wouldn't necessitate adding new functionality to the base class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest instead modifying the __init__()
function for the MLFlowResultTracker
to accept a tags
keyword argument and saving it as an instance variable.
Lines 48 to 74 in 2c9101c
def __init__( | |
self, | |
tracking_uri: Optional[str] = None, | |
experiment_id: Optional[int] = None, | |
experiment_name: Optional[str] = None, | |
): | |
""" | |
Initialize result tracking via MLFlow. | |
:param tracking_uri: | |
The tracking uri. | |
:param experiment_id: | |
The experiment ID. If given, this has to be the ID of an existing experiment in MFLow. Has priority over | |
experiment_name. | |
:param experiment_name: | |
The experiment name. If this experiment name exists, add the current run to this experiment. Otherwise | |
create an experiment of the given name. | |
""" | |
import mlflow as _mlflow | |
self.mlflow = _mlflow | |
self.mlflow.set_tracking_uri(tracking_uri) | |
if experiment_id is not None: | |
experiment = self.mlflow.get_experiment(experiment_id=experiment_id) | |
experiment_name = experiment.name | |
if experiment_name is not None: | |
self.mlflow.set_experiment(experiment_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay yeah - makes sense.. and then in the start_run - after starting the run - update tags as well? I only basically added the base class method, because you can't add tags until there is an active run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe there's just a bit of confusion since the title
comes from the metadata - this is a pretty general piece of information (not specific to result tracker). Maybe it would make more sense if there's a priority to get the title from the result_tracker_kwargs
first then the metadata
second (I could take care of this in another PR - it's not relevant to yours)
@sunny1401 thanks for the PR! I've added a few comments on how we might simplify this. I hope you get the chance to learn a bit about the design choices we made :) |
@sunny1401 also you don't have to worry about the darglint failure. It's waiting there for a time when I'm verrry verrry bored and want to re-write all of the docstrings :) For now you can rely on the CI build on Travis. For the docs, you'll need dot installed since I've configured auto-generation of class hierarchies (see this example), which requires dot. If you're on mac, this can be installed with homebrew. If you're on windows... then good luck |
@sunny1401 you'll see that I made a few updates:
As a side note, in general, dictionaries can also be checked for truthyness or falsiness, so you don't have to check the length being not equal to or equal to zero. You will see that: def check_truthy(d):
if d:
print('truthy dict')
else:
print('falsy dict')
check_truthy({})
'falsy dict'
check_truthy({'hello': 5})
'truthy dict' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sunny1401 one more thing - all new code deserves a tutorial. Would you please update the example in https://github.com/pykeen/pykeen/blob/master/docs/source/tutorial/trackers/using_mlflow.rst to explain why you would want to put tags and give a tags={...}
as an example?
I saw the changes.. Makes total sense. I am sorry I got confused about this. |
I can do that.. I will add that. |
@cthoyt - I actually have a question - in the tutorial - i added in the pipeline docs - how to pass tags.. Do you think I should run the tutorial and add pictures for the tag values too? |
@sunny1401 nope, that's okay. A good tutorial introduces things one at a time, so your addition can come all the way at the bottom and doesn't necessarily need an accompanying photo. |
@sunny1401 I've edited the tutorial to make it a bit more concise. Thanks for you contribution, will merge when travis passes! If you'd like your full name listed in the contributors, please let me know how it's written. Otherwise, I will just leave it as your GitHub handle |
[skip ci]
I am so sorry you had to make so many corrections. Thanks for helping me so much. |
@sunny1401 no problem at all! I hope you learned a bit and we really appreciate that you were able to make a contribution. Thank you again! |
Oh my name is Sankranti Joshi.. I din read the complete thing |
[skip ci]
This PR adds set_tags function to MLFlowResultTracker in tracker.py file
Link to the relevant feature request - #135
Description of the Change - Currently the code doesn't provide any provision for adding tags related to a run. For example, adding details regarding the s3 version of a particular file for one run of an experiment cannot be done using the existing code.
Alternatives - Use the original API.
Possible Drawbacks - We could probably need mlflow API still for logging the model directly and versioning, but apart from that - I can't think of any possible drawbacks.
Verification Process - Ran tox. Got the following-
manifest: commands succeeded
flake8: commands succeeded
ERROR: darglint: commands failed (fails in file I have not touched)
pyroma: commands succeeded
readme: commands succeeded
doc8: commands succeeded
ERROR: docs: commands failed (Warning, treated as error:
dot command 'dot' cannot be run (needed for graphviz output), check the graphviz_dot setting
)
py: commands succeeded
integration: commands succeeded