Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gensim models show_topic/print_topic parameter num_words changed to topn to match other topic models #1200

Merged
merged 36 commits into from
May 23, 2017

Conversation

prakhar2b
Copy link
Contributor

@prakhar2b prakhar2b commented Mar 9, 2017

show_topic parameter num_words changed to topn in order to make it consistent with LdaModel. Fix #1198

@prakhar2b
Copy link
Contributor Author

@tmylk To standerize the api and bring consistency to the topic models with respect to LdaModel, following parameters need to be used, as per my understanding-

  • show_topics / print_topics - num_words
  • show_topic / print_topic - topn

According to above, there are still more inconsistencies in other topic models hdpmodel, dtmmodel, ldavowpalwabbit.

Please confirm it and I'll make changes accordingly.

@piskvorky
Copy link
Owner

piskvorky commented Mar 10, 2017

I believe we should support the old param too, perhaps with some deprecation warning.

Once we remove the existing params, we'll have to up the major version (gensim 2.0), because we switched to semantic versioning.

Without a clearly defined "public API" (and the Python philosophy doesn't care much for that), we'll probably be bumping the major version a lot.

@prakhar2b
Copy link
Contributor Author

prakhar2b commented Mar 10, 2017

@piskvorky OK. With reference to the current API, I'll add support to the consistent param with a deprecation warning for the old one without removing it.

For example, in hdp.show_topic, the current API suggests-

 show_topic(self,topic_id, num_words=20, log=False, formatted=False)
  • After change, it will be something like this
show_topic(self,topic_id, num_words=20, topn=20, log=False, formatted=False)
#deprecation warning
if topn is 20 and num_words is not 20 :       #old param num_words is used
    logger.warning("num_words is deprecated in the updated version. Please use topn.")

I will update this PR for all models accordingly as soon as possible.

@prakhar2b prakhar2b closed this Mar 13, 2017
@tmylk
Copy link
Contributor

tmylk commented Mar 13, 2017

What is the reason for closing this PR? you can just keep working in this branch

@prakhar2b prakhar2b reopened this Mar 16, 2017
@prakhar2b prakhar2b changed the title LdaMallet show_topic parameter num_words changed to topn to match other topic models LdaMallet show_topic/print_topic parameter num_words changed to topn to match other topic models Mar 16, 2017
@prakhar2b prakhar2b changed the title LdaMallet show_topic/print_topic parameter num_words changed to topn to match other topic models gensim models show_topic/print_topic parameter num_words changed to topn to match other topic models Mar 16, 2017
@prakhar2b
Copy link
Contributor Author

@tmylk unrelated checks fail .

@tmylk
Copy link
Contributor

tmylk commented Mar 17, 2017

Travis tests re-ran after smart_open update

show_topic parameter num_words changed to topn in order to make it consistent with LdaModel

show_topic parameter num_words changed to topn

both old and new param with deprecation warning

ldamallet now supports both num_words and topn parameters for show_topic with deprecation warning for the num_words.

hdpmodel show_topic supports old and new param

show_topic in hdpmodel now supports both num_words and topn parameters to make it consistent across all models, with deprecation warning for num_words

dtmmodel topn/num_words with deprecation warning

Inconsistency between api and code removed for topn/num_words by adding support for both params with proper deprecation warning

hdpmodel show_topic supports old and new param

show_topic in hdpmodel now supports both num_words and topn parameters to make it consistent across all models, with deprecation warning for num_words - checks should pass this time

hdpmodel show_topic supports old and new para

dtmmodel topn/num_words with deprecation warning

ldamallet show_topic param fixed

ldamallet now supports both num_words and topn parameters for show_topic with deprecation warning for the num_words.

dtmmodel topn/num_words with deprecation warning

dtmmodel is now compatible with both topn/num_words parameters for show_topic and others with proper deprecation warnings.

hdpmodel num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

hdpmodel num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

hdpmodel num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

dtmmodel num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

ldamallet num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

hdpmodel num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words

ldamallet num_words changed to topn with deprecation warning

To make the code consistent with the api-  parameters num_words changed to topn (for print_topic/show_topic method), with deprecation warning for num_words
@prakhar2b
Copy link
Contributor Author

prakhar2b commented Mar 19, 2017

@tmylk Squashed all commits into one.

Note : With reference to the API, following parameters have been standerized across models-

  • show_topics/ print_topics - num_words
  • show_topic/ print_topic - topn

As suggested in the above comment, old comment is still supported as of now, and proper deprecation warning has been added for num_words appropriately to keep the API relevant.

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please change the logic of the warnings.

@@ -445,11 +444,17 @@ def show_topic(self, topic_id, num_words=20, log=False, formatted=False):
`False` as lists of (weight, word) pairs.

"""
if topn is None: #deprecated num_words is used
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\
Please use topn instead. Ignore if you didn't use parameter num_words or topn for show_topic() ")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the purpose of adding "ignore if"? Would it be better to make topn=20 by default and num_words=None. Show warning if num_words is not None and add a comment to make it an Exception in the next release. Same applies everywhere.

return self.show_topic(topic_id, num_words, formatted=True)

def show_topic(self, topic_id, num_words, log=False, formatted=False):
def print_topic(self, topic_id,topn= None, num_words=20):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why add a default value here?

@tmylk
Copy link
Contributor

tmylk commented May 2, 2017

Ping @prakhar2b

@prakhar2b
Copy link
Contributor Author

yes, on this now. Thanks

@prakhar2b
Copy link
Contributor Author

@tmylk updated the PR. Thanks for the review comments.

"""
Return `num_words` most probable words for the given `topicid`, as a list of
`(word_probability, word)` 2-tuples.

"""
if num_words is not None: # deprecated num_words is used
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would include the whitespace after \ in the mesage.

It's better to split multi-line strings using "abc" "dce" (two strings next to each other, on different lines).

def show_topic(self, topicid, num_words=10):
def show_topic(self, topicid, topn=10, num_words=None):
if num_words is not None: # deprecated num_words is used
logger.warn("The parameter num_words for show_topic() method would be deprecated in the updated version.\
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dtto.

@prakhar2b
Copy link
Contributor Author

cc @piskvorky updated the PR according to review

@tmylk tmylk added breaks backward-compatibility Change breaks backward compatibility and removed breaks backward-compatibility Change breaks backward compatibility labels May 23, 2017
@tmylk
Copy link
Contributor

tmylk commented May 23, 2017

Note: this is backwards compatible.

@tmylk tmylk merged commit 834e130 into piskvorky:develop May 23, 2017
return self.show_topic(topic_id, num_words, formatted=True)
def print_topic(self, topic_id, topn= None, num_words=None):
if num_words is not None: # deprecated num_words is used
logger.warning("The parameter num_words for print_topic() would be deprecated in the updated version.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be warnings.warn, not a logging message (will spam logs).

def print_topic(self, topic_id, topn= None, num_words=None):
if num_words is not None: # deprecated num_words is used
logger.warning("The parameter num_words for print_topic() would be deprecated in the updated version.")
logger.warning("Please use topn instead.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for two messages, one warning is enough (concatenate the messages).

def show_topic(self, topic_id, num_words, log=False, formatted=False):
def show_topic(self, topic_id, topn=20, log=False, formatted=False, num_words= None,):
if num_words is not None: # deprecated num_words is used
logger.warning("The parameter num_words for show_topic() would be deprecated in the updated version.")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto

"""Return the given topic, formatted as a string."""
return ' + '.join(['%.3f*%s' % v for v in self.show_topic(topicid, time, num_words)])
if num_words is not None: # deprecated num_words is used
logger.warning("The parameter num_words for print_topic(() would be deprecated in the updated version.")
Copy link
Owner

@piskvorky piskvorky May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtto.

Also, too many opening brackets (().

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants