Skip to content

RuiBai1999/HiMatch

Repository files navigation

HiMatch

The code for ACL-2021 Long Paper Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification

Dependency

PyTorch>1.1, sklearn, tqdm  

Dataset

RCV1-V2
WOS
EURLEX-57K
Glove.6B.300d.txt

Preprocess

Dataset Preprocess

Transform your dataset to json format file {'token': List[str], 'label': List[str]}
You can refer to data_modules/preprocess.py, and here is the WOS dataset Google Drive after preprocessing.

Label Prior Probability (Label Structure)

Preprocess the taxnomy format (data/wos.taxnomy and data/wos_prob_child_parent.json)
Extract Label Prior Probability

python helper/hierarchy_tree_statistic.py config/wos.json  

Label Description

We use classic TD-IDF to extract the representative words for each label.

python construct_label_desc.py  

For RCV1-V2, you can find label description from here.
In our follow-up actual practice, we found that introducing richer label representations is beneficial for further improvement.

Train

Modify the training settings in config/wos.json.

python train.py config/wos.json  

Hyperparamter Description

sample_num: 2. The averge label number of WOS is 2. For every positive label, we all regard them as positive label index and construct matching pairs.  
negative_ratio: 3. Coarse-grained label, wrong sibling label and other wrong label.  
total_sample_num: 2*3=6.  

Other Experimental Settings

The experimental settings on EURLEX-57K: KAMG
The experimental settings on BERT: Bert-Multi-Label-Text-Classification

Cite

@inproceedings{chen-etal-2021-hierarchy,
    title = "Hierarchy-aware Label Semantics Matching Network for Hierarchical Text Classification",
    author = "Chen, Haibin  and Ma, Qianli  and Lin, Zhenxi  and Yan, Jiangyue",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers)",
    year = "2021",
    url = "https://aclanthology.org/2021.acl-long.337",
    pages = "4370--4379"
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages