-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gensim models show_topic/print_topic parameter num_words changed to topn to match other topic models #1200
Conversation
@tmylk To standerize the api and bring consistency to the topic models with respect to LdaModel, following parameters need to be used, as per my understanding-
According to above, there are still more inconsistencies in other topic models Please confirm it and I'll make changes accordingly. |
I believe we should support the old param too, perhaps with some deprecation warning. Once we remove the existing params, we'll have to up the major version (gensim 2.0), because we switched to semantic versioning. Without a clearly defined "public API" (and the Python philosophy doesn't care much for that), we'll probably be bumping the major version a lot. |
@piskvorky OK. With reference to the current API, I'll add support to the consistent param with a deprecation warning for the old one without removing it. For example, in hdp.show_topic, the current API suggests- show_topic(self,topic_id, num_words=20, log=False, formatted=False)
show_topic(self,topic_id, num_words=20, topn=20, log=False, formatted=False)
#deprecation warning
if topn is 20 and num_words is not 20 : #old param num_words is used
logger.warning("num_words is deprecated in the updated version. Please use topn.") I will update this PR for all models accordingly as soon as possible. |
What is the reason for closing this PR? you can just keep working in this branch |
@tmylk unrelated checks fail . |
Travis tests re-ran after smart_open update |
show_topic parameter num_words changed to topn in order to make it consistent with LdaModel show_topic parameter num_words changed to topn both old and new param with deprecation warning ldamallet now supports both num_words and topn parameters for show_topic with deprecation warning for the num_words. hdpmodel show_topic supports old and new param show_topic in hdpmodel now supports both num_words and topn parameters to make it consistent across all models, with deprecation warning for num_words dtmmodel topn/num_words with deprecation warning Inconsistency between api and code removed for topn/num_words by adding support for both params with proper deprecation warning hdpmodel show_topic supports old and new param show_topic in hdpmodel now supports both num_words and topn parameters to make it consistent across all models, with deprecation warning for num_words - checks should pass this time hdpmodel show_topic supports old and new para dtmmodel topn/num_words with deprecation warning ldamallet show_topic param fixed ldamallet now supports both num_words and topn parameters for show_topic with deprecation warning for the num_words. dtmmodel topn/num_words with deprecation warning dtmmodel is now compatible with both topn/num_words parameters for show_topic and others with proper deprecation warnings. hdpmodel num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words hdpmodel num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words hdpmodel num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words dtmmodel num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words ldamallet num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words hdpmodel num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words ldamallet num_words changed to topn with deprecation warning To make the code consistent with the api- parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words
@tmylk Squashed all commits into one. Note : With reference to the API, following parameters have been standerized across models-
As suggested in the above comment, old comment is still supported as of now, and proper deprecation warning has been added for num_words appropriately to keep the API relevant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please change the logic of the warnings.
gensim/models/hdpmodel.py
Outdated
@@ -445,11 +444,17 @@ def show_topic(self, topic_id, num_words=20, log=False, formatted=False): | |||
`False` as lists of (weight, word) pairs. | |||
|
|||
""" | |||
if topn is None: #deprecated num_words is used | |||
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\ | |||
Please use topn instead. Ignore if you didn't use parameter num_words or topn for show_topic() ") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of adding "ignore if"? Would it be better to make topn=20
by default and num_words=None
. Show warning if num_words is not None
and add a comment to make it an Exception in the next release. Same applies everywhere.
gensim/models/hdpmodel.py
Outdated
return self.show_topic(topic_id, num_words, formatted=True) | ||
|
||
def show_topic(self, topic_id, num_words, log=False, formatted=False): | ||
def print_topic(self, topic_id,topn= None, num_words=20): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why add a default value here?
Ping @prakhar2b |
yes, on this now. Thanks |
@tmylk updated the PR. Thanks for the review comments. |
gensim/models/wrappers/dtmmodel.py
Outdated
""" | ||
Return `num_words` most probable words for the given `topicid`, as a list of | ||
`(word_probability, word)` 2-tuples. | ||
|
||
""" | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would include the whitespace after \
in the mesage.
It's better to split multi-line strings using "abc" "dce"
(two strings next to each other, on different lines).
gensim/models/wrappers/ldamallet.py
Outdated
def show_topic(self, topicid, num_words=10): | ||
def show_topic(self, topicid, topn=10, num_words=None): | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dtto.
cc @piskvorky updated the PR according to review |
Note: this is backwards compatible. |
return self.show_topic(topic_id, num_words, formatted=True) | ||
def print_topic(self, topic_id, topn= None, num_words=None): | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warning("The parameter num_words for print_topic() would be deprecated in the updated version.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be warnings.warn
, not a logging message (will spam logs).
def print_topic(self, topic_id, topn= None, num_words=None): | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warning("The parameter num_words for print_topic() would be deprecated in the updated version.") | ||
logger.warning("Please use topn instead.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for two messages, one warning is enough (concatenate the messages).
def show_topic(self, topic_id, num_words, log=False, formatted=False): | ||
def show_topic(self, topic_id, topn=20, log=False, formatted=False, num_words= None,): | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warning("The parameter num_words for show_topic() would be deprecated in the updated version.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtto
"""Return the given topic, formatted as a string.""" | ||
return ' + '.join(['%.3f*%s' % v for v in self.show_topic(topicid, time, num_words)]) | ||
if num_words is not None: # deprecated num_words is used | ||
logger.warning("The parameter num_words for print_topic(() would be deprecated in the updated version.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtto.
Also, too many opening brackets (()
.
show_topic parameter num_words changed to topn in order to make it consistent with LdaModel. Fix #1198