Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I have added flsamodel, which includes the topic modeling algorithms FLSA, FLSA-W and FLSA-E. In experimental results, FLSA-W has outperformed other state of the art algorithms on various open datasets (e.g. LDA, LSI, NMF, ProdLDA).
Motivation:
Since Gensim features various state-of-the-art topic modeling algorithms, and my group's algorithms outperform these algorithms in terms of coherence-, diversity- and interpretability score, we believe our algorithms should be featured in Gensim too. Previously, I created a wrapper function that depended on FuzzyTM. In this PR, the algorithms are trained within Gensim.
People can use this code similarly to how LDAmodel is being used. For the 'corpus' the following datatypes are allowed:
The algorithms have been featured in various scientific publications. See the links below:
FLSA-W:
Rijcken, E., Scheepers, F., Mosteiro, P., Zervanou, K., Spruit, M., & Kaymak, U. (2021, December). A comparative study of fuzzy topic models and lda in terms of interpretability. In 2021 IEEE Symposium Series on Computational Intelligence (SSCI) (pp. 1-8). IEEE.
FLSA-E:
Rijcken, E., Zervanou, K., Spruit, M., Mosteiro, P., Scheepers, F., & Kaymak, U. (2022). Exploring Embedding Spaces for more Coherent Topic Modeling in Electronic Health Records. In IEEE International Conference on Systems, Man, and Cybernetics.
FLSA:
Karami, Amir, et al. "Fuzzy approach topic discovery in health and medical corpora." International Journal of Fuzzy Systems 20.4 (2018): 1334-1345.
These algorithms are featured in the FuzzyTM package:
Rijcken, E., Mosteiro, P., Zervanou, K., Spruit, M., Scheepers, F., & Kaymak, U. (2022, July). FuzzyTM: a software package for fuzzy topic modeling. In 2022 IEEE international conference on fuzzy systems (FUZZ-IEEE) (pp. 1-8). IEEE.
Experimental results:
Topic quality (open datasets):
https://research.tue.nl/en/publications/a-performance-evaluation-of-topic-models-based-on-fuzzy-latent-se
Predictive power vs. topic quality (private dataset):
Rijcken, E., Kaymak, U., Scheepers, F., Mosteiro, P., Zervanou, K., & Spruit, M. (2022). Topic Modeling for Interpretable Text Classification From EHRs. Frontiers in Big Data, 5.