Replies: 2 comments 1 reply
-
@kitsamho, I think this is a very interesting idea. Could you elaborate a bit more on how you envision it to work? Just doing the topic extraction does not seem to be too complicated, but in this case the model might produce similar, but not identical topics which in extreme cases might result in n_clusters = n_samples. So maybe the
Of course an alternative could be to pack all samples into a single prompt and specify the constraint on the number of topics right away, but this will only work for relatively small datasets. What do you think ? Or maybe you have a specific dataset in mind on which you could demonstrate the desired result ? |
Beta Was this translation helpful? Give feedback.
-
Maybe a good idea is using GPTSummarizer model with max_words=3 (or 2, 1) in order to get "topics", .i.e., a specific word o phrase which summarizes the idea of text in few words. |
Beta Was this translation helpful? Give feedback.
-
It would be super cool to see some form of sklearn clustering / topic modelling implementation where the inputs are texts and outputs are clusters/topics in the data.
Beta Was this translation helpful? Give feedback.
All reactions