Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MLflow set tags function #139

Merged
merged 10 commits into from
Nov 9, 2020
Merged

Add MLflow set tags function #139

merged 10 commits into from
Nov 9, 2020

Conversation

sunny1401
Copy link
Contributor

This PR adds set_tags function to MLFlowResultTracker in tracker.py file

Link to the relevant feature request - #135

Description of the Change - Currently the code doesn't provide any provision for adding tags related to a run. For example, adding details regarding the s3 version of a particular file for one run of an experiment cannot be done using the existing code.

Alternatives - Use the original API.

Possible Drawbacks - We could probably need mlflow API still for logging the model directly and versioning, but apart from that - I can't think of any possible drawbacks.

Verification Process - Ran tox. Got the following-
manifest: commands succeeded
flake8: commands succeeded
ERROR: darglint: commands failed (fails in file I have not touched)
pyroma: commands succeeded
readme: commands succeeded
doc8: commands succeeded
ERROR: docs: commands failed (Warning, treated as error:
dot command 'dot' cannot be run (needed for graphviz output), check the graphviz_dot setting
)
py: commands succeeded
integration: commands succeeded

@codecov
Copy link

codecov bot commented Nov 8, 2020

Codecov Report

Merging #139 (378ecbf) into master (2c9101c) will decrease coverage by 0.04%.
The diff coverage is 33.33%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #139      +/-   ##
==========================================
- Coverage   67.50%   67.46%   -0.05%     
==========================================
  Files          93       93              
  Lines        5813     5818       +5     
  Branches      743      745       +2     
==========================================
+ Hits         3924     3925       +1     
- Misses       1677     1682       +5     
+ Partials      212      211       -1     
Impacted Files Coverage Δ
src/pykeen/trackers.py 54.54% <33.33%> (-1.23%) ⬇️
src/pykeen/models/cli/builders.py 24.35% <0.00%> (-0.98%) ⬇️
src/pykeen/models/unimodal/rgcn.py 80.36% <0.00%> (-0.09%) ⬇️
src/pykeen/models/base.py 85.50% <0.00%> (+0.29%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c9101c...64875eb. Read the comment docs.

@@ -834,9 +834,11 @@ def pipeline( # noqa: C901
if not metadata:
metadata = {}
title = metadata.get('title')
tags = metadata.get('run_tags', None)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would make more sense that this result tracker-specific information is passed to the pipeline via the result_tracker_kwargs instead of the metadata.

result_tracker = result_tracker_cls(**(result_tracker_kwargs or {}))

Tags are only relevant for MLflow, so it wouldn't necessitate adding new functionality to the base class

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest instead modifying the __init__() function for the MLFlowResultTracker to accept a tags keyword argument and saving it as an instance variable.

def __init__(
self,
tracking_uri: Optional[str] = None,
experiment_id: Optional[int] = None,
experiment_name: Optional[str] = None,
):
"""
Initialize result tracking via MLFlow.
:param tracking_uri:
The tracking uri.
:param experiment_id:
The experiment ID. If given, this has to be the ID of an existing experiment in MFLow. Has priority over
experiment_name.
:param experiment_name:
The experiment name. If this experiment name exists, add the current run to this experiment. Otherwise
create an experiment of the given name.
"""
import mlflow as _mlflow
self.mlflow = _mlflow
self.mlflow.set_tracking_uri(tracking_uri)
if experiment_id is not None:
experiment = self.mlflow.get_experiment(experiment_id=experiment_id)
experiment_name = experiment.name
if experiment_name is not None:
self.mlflow.set_experiment(experiment_name)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay yeah - makes sense.. and then in the start_run - after starting the run - update tags as well? I only basically added the base class method, because you can't add tags until there is an active run.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe there's just a bit of confusion since the title comes from the metadata - this is a pretty general piece of information (not specific to result tracker). Maybe it would make more sense if there's a priority to get the title from the result_tracker_kwargs first then the metadata second (I could take care of this in another PR - it's not relevant to yours)

src/pykeen/pipeline.py Outdated Show resolved Hide resolved
@cthoyt
Copy link
Member

cthoyt commented Nov 8, 2020

@sunny1401 thanks for the PR! I've added a few comments on how we might simplify this. I hope you get the chance to learn a bit about the design choices we made :)

@cthoyt cthoyt changed the title Add mlflow set tags function Add MLflow set tags function Nov 8, 2020
@cthoyt
Copy link
Member

cthoyt commented Nov 8, 2020

@sunny1401 also you don't have to worry about the darglint failure. It's waiting there for a time when I'm verrry verrry bored and want to re-write all of the docstrings :) For now you can rely on the CI build on Travis. For the docs, you'll need dot installed since I've configured auto-generation of class hierarchies (see this example), which requires dot. If you're on mac, this can be installed with homebrew. If you're on windows... then good luck

@cthoyt
Copy link
Member

cthoyt commented Nov 9, 2020

@sunny1401 you'll see that I made a few updates:

  1. The double underscore in the variable name self.__tags isn't necessary. Usually this only makes code harder to read, so I've changed it to just tags
  2. Rather than using two different lines to initialize self.tags that includes a conditional, now it is just set to whatever is given in the call to __init__.
  3. A none check is now used instead of a length check.

As a side note, in general, dictionaries can also be checked for truthyness or falsiness, so you don't have to check the length being not equal to or equal to zero. You will see that:

def check_truthy(d):
    if d:
       print('truthy dict')
    else:
        print('falsy dict')

check_truthy({})
'falsy dict'

check_truthy({'hello': 5})
'truthy dict'

Copy link
Member

@cthoyt cthoyt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sunny1401 one more thing - all new code deserves a tutorial. Would you please update the example in https://github.com/pykeen/pykeen/blob/master/docs/source/tutorial/trackers/using_mlflow.rst to explain why you would want to put tags and give a tags={...} as an example?

@cthoyt cthoyt added the enhancement New feature or request label Nov 9, 2020
@sunny1401
Copy link
Contributor Author

@sunny1401 you'll see that I made a few updates:

  1. The double underscore in the variable name self.__tags isn't necessary. Usually this only makes code harder to read, so I've changed it to just tags
  2. Rather than using two different lines to initialize self.tags that includes a conditional, now it is just set to whatever is given in the call to __init__.
  3. A none check is now used instead of a length check.

As a side note, in general, dictionaries can also be checked for truthyness or falsiness, so you don't have to check the length being not equal to or equal to zero. You will see that:

def check_truthy(d):
    if d:
       print('truthy dict')
    else:
        print('falsy dict')

check_truthy({})
'falsy dict'

check_truthy({'hello': 5})
'truthy dict'

I saw the changes.. Makes total sense. I am sorry I got confused about this.

@sunny1401
Copy link
Contributor Author

@sunny1401 one more thing - all new code deserves a tutorial. Would you please update the example in https://github.com/pykeen/pykeen/blob/master/docs/source/tutorial/trackers/using_mlflow.rst to explain why you would want to put tags and give a tags={...} as an example?

I can do that.. I will add that.

@sunny1401
Copy link
Contributor Author

@cthoyt - I actually have a question - in the tutorial - i added in the pipeline docs - how to pass tags.. Do you think I should run the tutorial and add pictures for the tag values too?

@cthoyt
Copy link
Member

cthoyt commented Nov 9, 2020

@sunny1401 nope, that's okay. A good tutorial introduces things one at a time, so your addition can come all the way at the bottom and doesn't necessarily need an accompanying photo.

@cthoyt
Copy link
Member

cthoyt commented Nov 9, 2020

@sunny1401 I've edited the tutorial to make it a bit more concise. Thanks for you contribution, will merge when travis passes! If you'd like your full name listed in the contributors, please let me know how it's written. Otherwise, I will just leave it as your GitHub handle

@sunny1401
Copy link
Contributor Author

I am so sorry you had to make so many corrections. Thanks for helping me so much.

@cthoyt
Copy link
Member

cthoyt commented Nov 9, 2020

@sunny1401 no problem at all! I hope you learned a bit and we really appreciate that you were able to make a contribution. Thank you again!

@sunny1401
Copy link
Contributor Author

Oh my name is Sankranti Joshi.. I din read the complete thing

@cthoyt cthoyt merged commit c1a56f5 into pykeen:master Nov 9, 2020
@sunny1401 sunny1401 deleted the add-mlflow-set-tags-function branch November 9, 2020 16:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants