Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use .comet.config file for CometLogger #1913

Merged
merged 9 commits into from
Aug 7, 2020
Merged

Use .comet.config file for CometLogger #1913

merged 9 commits into from
Aug 7, 2020

Conversation

neighthan
Copy link
Contributor

@neighthan neighthan commented May 21, 2020

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure to update the docs?
  • Did you write any new necessary tests?
    • Not possible unless somebody has a Comet account specifically for testing (with the API key in the CI environment).
  • If you made a notable change (that affects users), did you update the CHANGELOG?
    • I didn't think this was particularly notable; Comet users will expect this already, and it's backwards-compatible, so it won't bite anybody.

What does this PR do?

Fixes #1810.

@pep8speaks
Copy link

pep8speaks commented May 21, 2020

Hello @neighthan! Thanks for updating this PR.

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2020-08-07 07:46:22 UTC

@mergify mergify bot requested a review from a team May 21, 2020 02:03
@mergify
Copy link
Contributor

mergify bot commented May 25, 2020

This pull request is now in conflict... :(

@codecov
Copy link

codecov bot commented May 25, 2020

Codecov Report

Merging #1913 into master will increase coverage by 1%.
The diff coverage is 100%.

@@           Coverage Diff           @@
##           master   #1913    +/-   ##
=======================================
+ Coverage      89%     90%    +1%     
=======================================
  Files          79      79            
  Lines        7302    7197   -105     
=======================================
- Hits         6514    6501    -13     
+ Misses        788     696    -92     

@Borda Borda added the feature Is an improvement or enhancement label May 25, 2020
@Borda
Copy link
Member

Borda commented May 29, 2020

it seems that it is not back-compatible with older Comet version...
mind update min requirements or make this parameter as optional for the new version?

@Borda Borda added the waiting on author Waiting on user action, correction, or update label May 29, 2020
@edenlightning
Copy link
Contributor

@neighthan ping :)

@neighthan
Copy link
Contributor Author

Sorry for the delay; I just updated the comet_ml version in environment.yml and requirements/extra.txt. As Comet itself prompts the user to update it every time it's used if you're not on the current version, I figured requiring people to have an up-to-date Comet installation isn't a big issue. Hopefully the tests pass now.

raise MisconfigurationException("CometLogger requires either api_key or save_dir during initialization.")
# for backwards-compatibility, we have to check api_key then save_dir and
# only then use ~/.comet.config or the environment variable
api_key = get_api_key(None, get_config())
Copy link
Contributor

@rohitgr7 rohitgr7 Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be checked above before if api_key is not None: to avoid duplicate code below.

if api_key is None:
    api_key = get_api_key(None, get_config())

Will it break some backward-compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to preserve the previous behavior of

  1. If you pass in an API key, it is used (regardless of save_dir)
  2. If no API key is given but save_dir is, an offline experiment is used

Step 3. before would have been to error; now step 3 is to use the API key from the environment / the config file if given and only error if not. If we load the API key in at the start, then behavior 2. can't be done anymore (i.e. if I have a ~/.comet_config but still want the experiment to be saved locally, I could pass in save_dir and that would be used instead). I like keeping explicit local saving as an option, so if we're going to change the behavior, I'd suggest

  1. if save_dir is given, do an offline experiment (regardless of API key status; breaks backwards-compatibility)
  2. if not, use the given API key if there is one or load it from the env variable / file if not
  3. if none of these work, throw the error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! the suggested workflow looks like a better way to make this work, although the current one is also right but not optimal(just repeated code).

Copy link
Contributor Author

@neighthan neighthan Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about throwing an error if save_dir and api_key are both given? This also breaks backwards-compatibility but not silently. By disallowing this, we remove potential confusion about which argument has precedence if both are given. And then we can proceed with the second option above.

Copy link
Contributor

@rohitgr7 rohitgr7 Jun 27, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was thinking the same. For now, api_key is given priority if both are given. As a user, I don't think I will bother to give a save_dir even if I want to run it in online mode. I will give save_dir only when I want to run it in offline mode even if I have a key already present in my system or I will give an api_key. So I think the suggested flow looks better to me atleast but you should talk to one of the core-contributor first before changing it further.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

save_dir actually has another effect as well - setting it will change where the model weights are saved. I just ran into a case where I was doing offline experiments and using save_dir and then wanted to switch to online ones, but I think it's more inconvenient and error-prone to have to change the save directory somewhere else when I already have save_dir set up for that. Thus I think it actually could make sense to specify save_dir even if you want online-mode. What about adding a another argument to specify whether the experiment should be online?

  • If save_dir is given and no API key can be found, we can infer it should be offline.
  • If save_dir is not given, then it's online.
  • If desired, explicitly giving an API key could lead to online mode.
  • If an API key can be found (e.g. in ~/.comet.config) but save_dir is also given, then we use the new argument to determine whether to be online or not. This lets users use save_dir to control where checkpoints are stored even when doing online experiments and lets users do offline experiments despite having a ~/.comet.config.

We'd also want to change it so that save_dir is always saved as an instance variable (instead of only if no API key is given).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, if save_dir is None then in case of online more where are models saved?? I think if someone wants to explicitly specify the model weights directory, one should use ModelCheckpoint rather than specifying the model directory in logger instance.

@Borda Borda removed the waiting on author Waiting on user action, correction, or update label Jun 27, 2020
@Borda Borda added this to the 0.8.x milestone Jul 1, 2020
@neighthan
Copy link
Contributor Author

Status summary: this should all work as-is and be backwards-compatible right now (the config file is only used in cases where an error would have been thrown before). The only remaining thing that I see right now is to decide whether we want to change how save_dir / api_key are handled. Here are the options I think are most reasonable:

Alternative 0 (current behavior) - api_key has priority

  1. api_key given -> online (regardless of save_dir)
  2. api_key not given and save_dir given -> offline
  3. neither given and config file found -> online (added by this PR)

Alternative 1 - save_dir has priority

  1. save_dir given -> offline (regardless of api_key; breaks backwards-compatibility)
  2. save_dir not given and api_key given or config file found -> online (api_key is used before the config file still)

Alternative 2 - save_dir used for controlling checkpoint directory

  1. save_dir given and api_key not given / no config file -> offline
  2. save_dir not given and api_key given or config file found -> online
  3. save_dir given and api_key given or config file found -> use new online argument to determine mode

I prefer alternative 2 because I like using save_dir on the logger to control where checkpoints are stored (I want my checkpoints and my Comet offline files in the same place). As Rohit noted, there are alternative ways to specify where checkpoints go, but since save_dir on the logger is one supported way, I think it's nice to keep that option whether you want online or offline experiments. The downside of this is the need to add another argument to __init__ and that it breaks backwards-compatibility. Alternative 0 is best if that's the main concern. Let me know your thoughts and we can merge this with the current behavior or I can alter things if desired.

@Borda
Copy link
Member

Borda commented Jul 2, 2020

I would also go with Alternative 2 🐰 @PyTorchLightning/core-contributors

@justusschock
Copy link
Member

I'd also prefer Option 2

@neighthan
Copy link
Contributor Author

Okay, I just pushed an implementation of Alternative 2. I set offline: bool = True so that if an API key is found and save_dir is given, the experiment will be in online mode by default. If anybody objects, I'm also fine either making this offline by default (though I think if you have an API key, you'd usually want experiments online) or setting offline: Optional[bool] = None and then checking, only when api_key and save_dir are given, that offline is not None (otherwise raising a ValueError). Let me know any thoughts / anything else to change.

I'm not sure why that TPU test is failing, but I couldn't see how it would be related to this PR.

@mergify mergify bot requested a review from a team August 6, 2020 12:18
@Borda
Copy link
Member

Borda commented Aug 6, 2020

@neighthan I have rebased on master, let's finish it...
cc: @justusschock

Copy link
Contributor

@rohitgr7 rohitgr7 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@mergify mergify bot requested a review from a team August 6, 2020 17:53
@Borda Borda modified the milestones: 0.8.x, 0.9.0 Aug 6, 2020
Copy link
Member

@SkafteNicki SkafteNicki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice addition

@Borda Borda added the ready PRs ready to be merged label Aug 7, 2020
@Borda Borda merged commit 234e2b5 into Lightning-AI:master Aug 7, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Is an improvement or enhancement ready PRs ready to be merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use .comet.config file for CometLogger
7 participants