Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

Merged
merged 1 commit into from
Jun 13, 2024

Conversation

ddicato
Copy link
Contributor

@ddicato ddicato commented May 22, 2024

No description provided.

topics = self.representation_model["Main"].extract_topics(self, documents, c_tf_idf, topics)
elif isinstance(self.representation_model["Main"], list):
for tuner in self.representation_model["Main"]:
topics = tuner.extract_topics(self, documents, c_tf_idf, topics)
topics = {label: values[:self.top_n_words] for label, values in topics.items()}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it would be nice to have chained main models, it does bring a number of difficulties.

First, whenever you chain the main representation, any additional aspects that you want to create are then created on top of that chained representation. This inhibits the functionality of creating multiple aspects. Thus, whenever you use a chained Main representation, you cannot really use additional aspects as they will use the same chained base as the Main representation.

Second, whenever you create a chained representation as the Main model, it will still apply top_n_words, which might not be appropriate depending on the last chain the representation.

Copy link
Contributor Author

@ddicato ddicato May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, thanks for explaining that. I clearly wasn't aware of the full implications of this change.

For the first issue: What is the desired behavior here? If I'm creating multiple aspects, I would think that each chain would be independent, not concatenated onto "Main". If that's the intended behavior, I think I can fix that.

For the second issue: It seems like this is a problem with chained main representations in general, whether or not we have multiple aspects. What is the intended behavior of top_n_words? Do we wish to automatically apply it after "Main" and before every non-Main aspect? Or do we wish to apply it after every non-Main aspect? Or do we prefer allowing the user to control whether/when top_n_words is applied to each aspect?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the first issue: What is the desired behavior here? If I'm creating multiple aspects, I would think that each chain would be independent, not concatenated onto "Main". If that's the intended behavior, I think I can fix that.

The desired behavior would be that any additional aspect is built on top of the default "Main" representation which is the c-TF-IDF representation. If it would be built on top of the chained "Main" representation, then that would create strange results, especially from a user-experience perspective.

For the second issue: It seems like this is a problem with chained main representations in general, whether or not we have multiple aspects. What is the intended behavior of top_n_words? Do we wish to automatically apply it after "Main" and before every non-Main aspect? Or do we wish to apply it after every non-Main aspect? Or do we prefer allowing the user to control whether/when top_n_words is applied to each aspect?

top_n_words is used mainly for c-TF-IDF, so the default representation. Other representation models might have their own top_n_words parameter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback. I've pushed a new version where I attempted the following:

For Main representation:

  • If Main representation is missing, None, or falsy, then use default c-tf-idf + top_n_words
  • Avoid applying top_n_words in all other cases
  • Process BaseRepresentation, list[BaseRepresentation], and dict[str, BaseRepresentation | list[BaseRepresentation]]
  • raise TypeError in all other cases

For other aspects:

  • Copy c-tf-idf topics directly to use as input
  • If aspect_model is falsy, then use default c-tf-idf + top_n_words
  • Avoid applying top_n_words in all other cases
  • Process BaseRepresentation and list[BaseRepresentation]
  • raise TypeError in all other cases

@MaartenGr
Copy link
Owner

@ddicato Apologies for the late reply and thank you for the changes! This is exactly the behavior as intended and it is nicely and elegantly coded.

@MaartenGr MaartenGr merged commit df7116d into MaartenGr:master Jun 13, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants