- This is the author code of "TaxoCom: Topic Taxonomy Completion with Hierarchical Discovery of Novel Topic Clusters (WWW 2022)".
- This code is implemented based on the author code of "TaxoGen: Unsupervised Topic Taxonomy Construction by Adaptive Term Embedding and Clustering (KDD 2018)" at this repository.
The overview of the TaxoCom framework which discovers the complete topic taxonomy by the recursive expansion of the given topic hierarchy. Starting from the root node, it performs (1) locally discriminative embedding and (2) novelty adaptive clustering, to selectively assign the terms (of each node) into one of the child nodes.
python
numpy
,scipy
spherecluster
sklearn 0.21
(for the compatibility withspherecluser
)
-
Download the datasets from the following links, then place them in
./data/nyt
and./data/arxiv
, respectively.
- Run the codes by using the following commands
cd code
bash run_taxocom.sh <dataset-name> <seed-taxo-name>
- For example, the downloaded
nyt
directory can be simply used by
bash run_taxocom.sh nyt seed_taxo