Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suppress alias warnings with verbosity<0 (fixes #4518) #5253

Merged
merged 18 commits into from
Oct 11, 2022

Conversation

jmoralez
Copy link
Collaborator

Sets the logging level before resolving the parameter aliases so that the warnings related to them can be suppressed using the verbosity parameter.

Fixes #4518.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this! Like I mentioned in #4518 (comment), I totally support this change. But I'd like to hear @StrikerRUS's perspective on this one as well before this is merged.

Until then, please see a few small suggestions below.

tests/python_package_test/test_engine.py Show resolved Hide resolved
src/io/config.cpp Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
@jmoralez jmoralez changed the title suppress alias warnings with verbosity=-1 suppress alias warnings with verbosity<0 Jun 1, 2022
@jmoralez jmoralez changed the title suppress alias warnings with verbosity<0 suppress alias warnings with verbosity<0 (fixes #4518) Jun 1, 2022
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me, thanks! Since Config is so central to the codebase, I'd like one other maintainer to review and approve before we merge this.

src/io/config.cpp Show resolved Hide resolved
Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR!
I'm sorry, config initialization routine is still not clear enough for me. However, I left some comments that may help to spot uncovered by this PR cases.

Comment on lines 47 to 48
GetInt(params, "verbose", &verbosity);
GetInt(params, "verbosity", &verbosity);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If params have both verbose and verbosity keys I think we should still emit the following warning for this case

Log::Warning("%s is set with %s=%s, %s=%s will be ignored. Current value: %s=%s",
alias->second.c_str(), alias_set->second.c_str(), params->at(alias_set->second).c_str(),
pair.first.c_str(), pair.second.c_str(), alias->second.c_str(), params->at(alias_set->second).c_str());

to inform user that verbosity has higher precedence over verbose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done in the KeyAliasTransform function after setting the verbosity and de-duplicating.

src/io/config.cpp Outdated Show resolved Hide resolved
src/io/config.cpp Show resolved Hide resolved
@@ -42,6 +42,19 @@ std::unordered_map<std::string, std::string> Config::Str2Map(const char* paramet
for (auto arg : args) {
KV2Map(&params, Common::Trim(arg).c_str());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the following warnings in KV2Map() function are still cannot be suppressed by passing verbosity param because // define verbosity and set logging level happens later in the code

} else {
Log::Warning("%s is set=%s, %s=%s will be ignored. Current value: %s=%s",
key.c_str(), value_search->second.c_str(), key.c_str(), value.c_str(),
key.c_str(), value_search->second.c_str());
}
}
} else {
Log::Warning("Unknown parameter %s", kv);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These warnings come from duplicated parameters, not aliases. I can try to address them as well in this PR if its in the scope.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's related.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second thought on this, maybe we should leave the duplicates warning? I think it helps the user more than being annoying, since you can remove the warnings by removing the duplicate argument or you can confirm that the argument passed through the CLI is overriding the one on the config file.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with you. I believe each warning is helpful, but unfortunately LightGBM users want to completely silence their program (according to the issues they create in the repo).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed by having an unordered_map of vectors where we store all parameters for as many times as they're defined, set the verbosity taking the first values of verbose and verbosity and then de-duplicating with the KeepFirstValues function, where warning level logs are printed but after the verbosity has been set, so these can be disabled as well.

tests/python_package_test/test_engine.py Outdated Show resolved Hide resolved
tests/python_package_test/test_engine.py Show resolved Hide resolved
@jmoralez
Copy link
Collaborator Author

jmoralez commented Jun 25, 2022

I've changed the approach, now:

  • Every parameter is saved to an unordered_multimap which can have several values for the same key, this is so we can see if there are any duplicate parameters after parsing them.
  • Once we have all the parameters in the multimap we check for verbose and verbosity and set the logging level.
  • We then move the multimap to a regular map and warn about duplicate arguments. This step takes a random one if two or more are provided (we can discuss this).
  • Once the parameters are in a map we check for the aliases as before.

This of course uses more memory now but I couldn't think of a better way to handle the possible duplicate parameters. I'm open to adapt this to a better idea.

@jmoralez
Copy link
Collaborator Author

jmoralez commented Jun 25, 2022

There are some issues right now with this because it prints the warnings several times and there seems to be a weird interaction with the dataset both in the python and R-packages where if you don't re-construct the dataset, verbosity -1 doesn't remove the alias warnings. I have to look into that.

The CLI works fine though 😄

@jmoralez
Copy link
Collaborator Author

Seems like the duplicate warnings about aliases is on master as well. I'll open an issue.

Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@StrikerRUS
Copy link
Collaborator

Could you please resolve merge conflicts?

Copy link
Collaborator

@StrikerRUS StrikerRUS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this PR!
I don't have any other additional comments except a one below about the function naming.

But I think I'm not qualified to formally approve these mostly cpp changes.
✔️

src/io/config.cpp Outdated Show resolved Hide resolved
src/io/config.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@guolinke guolinke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, except a minor naming

@StrikerRUS
Copy link
Collaborator

Can we merge this PR?

@jameslamb
Copy link
Collaborator

Seems like the duplicate warnings about aliases is on master as well. I'll open an issue.

did this happen? I couldn't find an issue like that from search and don't see one linked here.

Can we merge this PR?

Based on what I see in the diff for unit tests, I think this can be merged. It does seem like the warnings described in #3641 (comment) (which led to #4518) can now be suppressed with verbosity.

But I'll defer to @jmoralez . To be honest, there have been significant changes since I first approved and it's difficult for me to understand the current state of this PR a bit.

For example, the conversation in #5253 (comment) is unresolved.

@jmoralez
Copy link
Collaborator Author

jmoralez commented Sep 3, 2022

did this happen?

Yes, I opened #5332.

The duplicate parameter warnings were also addressed in this PR. The KeepFirstValues function keeps the first value defined for each parameter and warns about the rest (if there are any). This allows to override parameters defined in a configuration file through arguments in the CLI, or if defined twice in the parameter list in the R-package, the first one takes precedence and a warning is printed for the second one.

My opinion is that this is ready to be merged, but of course I'm biased haha.

@jameslamb jameslamb self-requested a review September 5, 2022 04:20
Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I opened #5332.

ah ok, thanks for pointing that out! I just got unlucky with the search terms I chose 😅

If that issue had mentioned this PR with #5253 or if a comment had been posted on this PR saying "created #5332", it would have been easier for me to find.

The duplicate parameter warnings were also addressed in this PR

Sorry, I'm still confused...do you mean specifically that as of this PR being merged, we could also close #5332?

If so, please:

@jmoralez
Copy link
Collaborator Author

jmoralez commented Sep 5, 2022

do you mean specifically that as of this PR being merged, we could also close #5332?

No, I meant that this was originally only going to remove warnings for parameter aliases, i.e. you set bagging_fraction and subsample, but in #5253 (comment) it was suggested that this also addressed warnings for duplicate parameters, i.e. you set bagging_fraction=0.7 and bagging_fraction=0.8, so that conversation was resolved because it was also addressed.

The duplicate prints (#5332) is not solved yet.

Copy link
Collaborator

@jameslamb jameslamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this and sorry for taking so long to come back and re-review! I understand now what his PR does and doesn't fix.

Let's merge this...it's only changing internal details, so easy to revise any of these design decisions in the future if we decide a different pattern is preferable for changing other logging-related stuff.

Awesome contribution, @jmoralez ! People have been asking for this for a long time 😬

@jameslamb jameslamb merged commit 4642712 into microsoft:master Oct 11, 2022
@jmoralez jmoralez deleted the alias-warnings branch October 11, 2022 04:17
@github-actions
Copy link

This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to suppress parameter alias related warnings
4 participants