Add MLflow set tags function #139

sunny1401 · 2020-11-08T21:44:26Z

This PR adds set_tags function to MLFlowResultTracker in tracker.py file

Link to the relevant feature request - #135

Description of the Change - Currently the code doesn't provide any provision for adding tags related to a run. For example, adding details regarding the s3 version of a particular file for one run of an experiment cannot be done using the existing code.

Alternatives - Use the original API.

Possible Drawbacks - We could probably need mlflow API still for logging the model directly and versioning, but apart from that - I can't think of any possible drawbacks.

Verification Process - Ran tox. Got the following-
manifest: commands succeeded
flake8: commands succeeded
ERROR: darglint: commands failed (fails in file I have not touched)
pyroma: commands succeeded
readme: commands succeeded
doc8: commands succeeded
ERROR: docs: commands failed (Warning, treated as error:
dot command 'dot' cannot be run (needed for graphviz output), check the graphviz_dot setting
)
py: commands succeeded
integration: commands succeeded

codecov · 2020-11-08T21:58:29Z

Codecov Report

Merging #139 (378ecbf) into master (2c9101c) will decrease coverage by 0.04%.
The diff coverage is 33.33%.

@@            Coverage Diff             @@
##           master     #139      +/-   ##
==========================================
- Coverage   67.50%   67.46%   -0.05%     
==========================================
  Files          93       93              
  Lines        5813     5818       +5     
  Branches      743      745       +2     
==========================================
+ Hits         3924     3925       +1     
- Misses       1677     1682       +5     
+ Partials      212      211       -1

Impacted Files	Coverage Δ
src/pykeen/trackers.py	`54.54% <33.33%> (-1.23%)`	⬇️
src/pykeen/models/cli/builders.py	`24.35% <0.00%> (-0.98%)`	⬇️
src/pykeen/models/unimodal/rgcn.py	`80.36% <0.00%> (-0.09%)`	⬇️
src/pykeen/models/base.py	`85.50% <0.00%> (+0.29%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 2c9101c...64875eb. Read the comment docs.

cthoyt · 2020-11-08T22:25:58Z

src/pykeen/pipeline.py

@@ -834,9 +834,11 @@ def pipeline(  # noqa: C901
    if not metadata:
        metadata = {}
    title = metadata.get('title')
+    tags = metadata.get('run_tags', None)


It would make more sense that this result tracker-specific information is passed to the pipeline via the result_tracker_kwargs instead of the metadata.

pykeen/src/pykeen/pipeline.py

Line 832 in 2c9101c

result_tracker = result_tracker_cls(**(result_tracker_kwargs or {}))

Tags are only relevant for MLflow, so it wouldn't necessitate adding new functionality to the base class

I would suggest instead modifying the __init__() function for the MLFlowResultTracker to accept a tags keyword argument and saving it as an instance variable.

pykeen/src/pykeen/trackers.py

Lines 48 to 74 in 2c9101c

def __init__(

self,

tracking_uri: Optional[str] = None,

experiment_id: Optional[int] = None,

experiment_name: Optional[str] = None,

):

"""

Initialize result tracking via MLFlow.

:param tracking_uri:

The tracking uri.

:param experiment_id:

The experiment ID. If given, this has to be the ID of an existing experiment in MFLow. Has priority over

experiment_name.

:param experiment_name:

The experiment name. If this experiment name exists, add the current run to this experiment. Otherwise

create an experiment of the given name.

"""

import mlflow as _mlflow

self.mlflow = _mlflow

self.mlflow.set_tracking_uri(tracking_uri)

if experiment_id is not None:

experiment = self.mlflow.get_experiment(experiment_id=experiment_id)

experiment_name = experiment.name

if experiment_name is not None:

self.mlflow.set_experiment(experiment_name)

okay yeah - makes sense.. and then in the start_run - after starting the run - update tags as well? I only basically added the base class method, because you can't add tags until there is an active run.

maybe there's just a bit of confusion since the title comes from the metadata - this is a pretty general piece of information (not specific to result tracker). Maybe it would make more sense if there's a priority to get the title from the result_tracker_kwargs first then the metadata second (I could take care of this in another PR - it's not relevant to yours)

src/pykeen/pipeline.py

cthoyt · 2020-11-08T22:32:30Z

@sunny1401 thanks for the PR! I've added a few comments on how we might simplify this. I hope you get the chance to learn a bit about the design choices we made :)

cthoyt · 2020-11-08T22:36:52Z

@sunny1401 also you don't have to worry about the darglint failure. It's waiting there for a time when I'm verrry verrry bored and want to re-write all of the docstrings :) For now you can rely on the CI build on Travis. For the docs, you'll need dot installed since I've configured auto-generation of class hierarchies (see this example), which requires dot. If you're on mac, this can be installed with homebrew. If you're on windows... then good luck

…ce method

cthoyt · 2020-11-09T11:53:35Z

@sunny1401 you'll see that I made a few updates:

The double underscore in the variable name self.__tags isn't necessary. Usually this only makes code harder to read, so I've changed it to just tags
Rather than using two different lines to initialize self.tags that includes a conditional, now it is just set to whatever is given in the call to __init__.
A none check is now used instead of a length check.

As a side note, in general, dictionaries can also be checked for truthyness or falsiness, so you don't have to check the length being not equal to or equal to zero. You will see that:

def check_truthy(d):
    if d:
       print('truthy dict')
    else:
        print('falsy dict')

check_truthy({})
'falsy dict'

check_truthy({'hello': 5})
'truthy dict'

cthoyt

@sunny1401 one more thing - all new code deserves a tutorial. Would you please update the example in https://github.com/pykeen/pykeen/blob/master/docs/source/tutorial/trackers/using_mlflow.rst to explain why you would want to put tags and give a tags={...} as an example?

sunny1401 · 2020-11-09T12:36:02Z

@sunny1401 you'll see that I made a few updates:

The double underscore in the variable name self.__tags isn't necessary. Usually this only makes code harder to read, so I've changed it to just tags

Rather than using two different lines to initialize self.tags that includes a conditional, now it is just set to whatever is given in the call to __init__.

A none check is now used instead of a length check.

As a side note, in general, dictionaries can also be checked for truthyness or falsiness, so you don't have to check the length being not equal to or equal to zero. You will see that:
def check_truthy(d):
    if d:
       print('truthy dict')
    else:
        print('falsy dict')

check_truthy({})
'falsy dict'

check_truthy({'hello': 5})
'truthy dict'

I saw the changes.. Makes total sense. I am sorry I got confused about this.

sunny1401 · 2020-11-09T12:36:32Z

@sunny1401 one more thing - all new code deserves a tutorial. Would you please update the example in https://github.com/pykeen/pykeen/blob/master/docs/source/tutorial/trackers/using_mlflow.rst to explain why you would want to put tags and give a tags={...} as an example?

I can do that.. I will add that.

sunny1401 · 2020-11-09T14:01:54Z

@cthoyt - I actually have a question - in the tutorial - i added in the pipeline docs - how to pass tags.. Do you think I should run the tutorial and add pictures for the tag values too?

cthoyt · 2020-11-09T14:03:30Z

@sunny1401 nope, that's okay. A good tutorial introduces things one at a time, so your addition can come all the way at the bottom and doesn't necessarily need an accompanying photo.

docs/source/tutorial/trackers/using_mlflow.rst

cthoyt · 2020-11-09T15:52:06Z

@sunny1401 I've edited the tutorial to make it a bit more concise. Thanks for you contribution, will merge when travis passes! If you'd like your full name listed in the contributors, please let me know how it's written. Otherwise, I will just leave it as your GitHub handle

[skip ci]

sunny1401 · 2020-11-09T15:53:44Z

I am so sorry you had to make so many corrections. Thanks for helping me so much.

cthoyt · 2020-11-09T15:55:19Z

@sunny1401 no problem at all! I hope you learned a bit and we really appreciate that you were able to make a contribution. Thank you again!

sunny1401 · 2020-11-09T15:59:41Z

Oh my name is Sankranti Joshi.. I din read the complete thing

[skip ci]

sunny1401 added 2 commits November 8, 2020 20:39

adding set_tags to mlflow tracker

49d4d42

update code in pipeline.py

72cc2e8

cthoyt reviewed Nov 8, 2020

View reviewed changes

src/pykeen/pipeline.py Outdated Show resolved Hide resolved

cthoyt changed the title ~~Add mlflow set tags function~~ Add MLflow set tags function Nov 8, 2020

sunny1401 and others added 2 commits November 8, 2020 22:39

pr comments - removing base class method and adding tags as an instan…

e0ebbcd

…ce method

Update trackers.py

378ecbf

cthoyt requested changes Nov 9, 2020

View reviewed changes

cthoyt added the enhancement New feature or request label Nov 9, 2020

sunny1401 and others added 2 commits November 9, 2020 14:05

adding tags in tutorial

08a1871

Add outline of tutorial

64875eb

cthoyt reviewed Nov 9, 2020

View reviewed changes

docs/source/tutorial/trackers/using_mlflow.rst Show resolved Hide resolved

sunny1401 and others added 2 commits November 9, 2020 15:33

Update tutorial based on pr comments

a748ae8

Editing and revising of tutorial

045d8e6

Update AUTHORS.md

1c38ca8

[skip ci]

Update AUTHORS.md

90dbd7e

[skip ci]

cthoyt merged commit c1a56f5 into pykeen:master Nov 9, 2020

sunny1401 deleted the add-mlflow-set-tags-function branch November 9, 2020 16:18

mberr mentioned this pull request Nov 10, 2020

ResultTracker class has no option for logging tags related to the project #135

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add MLflow set tags function #139

Add MLflow set tags function #139

sunny1401 commented Nov 8, 2020

codecov bot commented Nov 8, 2020 •

edited

Loading

cthoyt Nov 8, 2020

cthoyt Nov 8, 2020

sunny1401 Nov 8, 2020

cthoyt Nov 8, 2020

cthoyt commented Nov 8, 2020

cthoyt commented Nov 8, 2020

cthoyt commented Nov 9, 2020

cthoyt left a comment

sunny1401 commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

cthoyt commented Nov 9, 2020

cthoyt commented Nov 9, 2020 •

edited

Loading

sunny1401 commented Nov 9, 2020

cthoyt commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

	def __init__(
	self,
	tracking_uri: Optional[str] = None,
	experiment_id: Optional[int] = None,
	experiment_name: Optional[str] = None,
	):
	"""
	Initialize result tracking via MLFlow.

	:param tracking_uri:
	The tracking uri.
	:param experiment_id:
	The experiment ID. If given, this has to be the ID of an existing experiment in MFLow. Has priority over
	experiment_name.
	:param experiment_name:
	The experiment name. If this experiment name exists, add the current run to this experiment. Otherwise
	create an experiment of the given name.
	"""
	import mlflow as _mlflow
	self.mlflow = _mlflow

	self.mlflow.set_tracking_uri(tracking_uri)
	if experiment_id is not None:
	experiment = self.mlflow.get_experiment(experiment_id=experiment_id)
	experiment_name = experiment.name
	if experiment_name is not None:
	self.mlflow.set_experiment(experiment_name)

Add MLflow set tags function #139

Add MLflow set tags function #139

Conversation

sunny1401 commented Nov 8, 2020

codecov bot commented Nov 8, 2020 • edited Loading

Codecov Report

cthoyt Nov 8, 2020

Choose a reason for hiding this comment

cthoyt Nov 8, 2020

Choose a reason for hiding this comment

sunny1401 Nov 8, 2020

Choose a reason for hiding this comment

cthoyt Nov 8, 2020

Choose a reason for hiding this comment

cthoyt commented Nov 8, 2020

cthoyt commented Nov 8, 2020

cthoyt commented Nov 9, 2020

cthoyt left a comment

Choose a reason for hiding this comment

sunny1401 commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

cthoyt commented Nov 9, 2020

cthoyt commented Nov 9, 2020 • edited Loading

sunny1401 commented Nov 9, 2020

cthoyt commented Nov 9, 2020

sunny1401 commented Nov 9, 2020

codecov bot commented Nov 8, 2020 •

edited

Loading

cthoyt commented Nov 9, 2020 •

edited

Loading