In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

ddicato · 2024-05-22T01:40:39Z

No description provided.

MaartenGr · 2024-05-23T14:28:37Z

bertopic/_bertopic.py

+ topics = self.representation_model["Main"].extract_topics(self, documents, c_tf_idf, topics)
+ elif isinstance(self.representation_model["Main"], list):
+ for tuner in self.representation_model["Main"]:
+ topics = tuner.extract_topics(self, documents, c_tf_idf, topics)
 topics = {label: values[:self.top_n_words] for label, values in topics.items()}


Although it would be nice to have chained main models, it does bring a number of difficulties.

First, whenever you chain the main representation, any additional aspects that you want to create are then created on top of that chained representation. This inhibits the functionality of creating multiple aspects. Thus, whenever you use a chained Main representation, you cannot really use additional aspects as they will use the same chained base as the Main representation.

Second, whenever you create a chained representation as the Main model, it will still apply top_n_words, which might not be appropriate depending on the last chain the representation.

Interesting, thanks for explaining that. I clearly wasn't aware of the full implications of this change.

For the first issue: What is the desired behavior here? If I'm creating multiple aspects, I would think that each chain would be independent, not concatenated onto "Main". If that's the intended behavior, I think I can fix that.

For the second issue: It seems like this is a problem with chained main representations in general, whether or not we have multiple aspects. What is the intended behavior of top_n_words? Do we wish to automatically apply it after "Main" and before every non-Main aspect? Or do we wish to apply it after every non-Main aspect? Or do we prefer allowing the user to control whether/when top_n_words is applied to each aspect?

For the first issue: What is the desired behavior here? If I'm creating multiple aspects, I would think that each chain would be independent, not concatenated onto "Main". If that's the intended behavior, I think I can fix that.

The desired behavior would be that any additional aspect is built on top of the default "Main" representation which is the c-TF-IDF representation. If it would be built on top of the chained "Main" representation, then that would create strange results, especially from a user-experience perspective.

For the second issue: It seems like this is a problem with chained main representations in general, whether or not we have multiple aspects. What is the intended behavior of top_n_words? Do we wish to automatically apply it after "Main" and before every non-Main aspect? Or do we wish to apply it after every non-Main aspect? Or do we prefer allowing the user to control whether/when top_n_words is applied to each aspect?

top_n_words is used mainly for c-TF-IDF, so the default representation. Other representation models might have their own top_n_words parameter.

Thanks for the feedback. I've pushed a new version where I attempted the following:

For Main representation:

If Main representation is missing, None, or falsy, then use default c-tf-idf + top_n_words

Avoid applying top_n_words in all other cases

Process BaseRepresentation, list[BaseRepresentation], and dict[str, BaseRepresentation | list[BaseRepresentation]]

raise TypeError in all other cases

For other aspects:

Copy c-tf-idf topics directly to use as input

If aspect_model is falsy, then use default c-tf-idf + top_n_words

Avoid applying top_n_words in all other cases

Process BaseRepresentation and list[BaseRepresentation]

raise TypeError in all other cases

…nGr#1846]

MaartenGr · 2024-06-13T16:47:02Z

@ddicato Apologies for the late reply and thank you for the changes! This is exactly the behavior as intended and it is nicely and elegantly coded.

MaartenGr reviewed May 23, 2024

View reviewed changes

MaartenGr mentioned this pull request May 31, 2024

Allow chained main representation and aspects simultaneously #2021

Closed

ddicato force-pushed the master branch from 0a99b4b to b0a9602 Compare June 5, 2024 03:36

In multi-aspect context, allow Main model to be chained [Issue Maarte…

f3d85c1

…nGr#1846]

ddicato force-pushed the master branch from b0a9602 to f3d85c1 Compare June 5, 2024 03:58

MaartenGr mentioned this pull request Jun 6, 2024

Linting with Ruff #2033

Merged

4 tasks

MaartenGr merged commit df7116d into MaartenGr:master Jun 13, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

ddicato commented May 22, 2024

MaartenGr May 23, 2024

ddicato May 23, 2024 •

edited

Loading

MaartenGr May 24, 2024

ddicato Jun 5, 2024

MaartenGr commented Jun 13, 2024

In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

In multi-aspect context, allow Main model to be chained [Issue #1846] #2002

Conversation

ddicato commented May 22, 2024

MaartenGr May 23, 2024

Choose a reason for hiding this comment

ddicato May 23, 2024 • edited Loading

Choose a reason for hiding this comment

MaartenGr May 24, 2024

Choose a reason for hiding this comment

ddicato Jun 5, 2024

Choose a reason for hiding this comment

MaartenGr commented Jun 13, 2024

ddicato May 23, 2024 •

edited

Loading