GitHub - morningmoni/HiLAP: Code for paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019

This repo provides the code with paper "Hierarchical Text Classification with Reinforced Label Assignment" EMNLP 2019.

Abstract

While existing hierarchical text classification (HTC) methods attempt to capture label hierarchies for model training, they either make local decisions regarding each label or completely ignore the hierarchy information during inference. To solve the mismatch between training and inference as well as modeling label dependencies in a more principled way, we formulate HTC as a Markov decision process and propose to learn a Label Assignment Policy via deep reinforcement learning to determine where to place an object and when to stop the assignment process. The proposed method, HiLAP, explores the hierarchy during both training and inference time in a consistent manner and makes inter-dependent decisions. As a general framework, HiLAP can incorporate different neural encoders as base models for end-to-end training. Experiments on five public datasets and four base models show that HiLAP yields an average improvement of 33.4% in Macro-F1 over flat classifiers and outperforms state-of-the-art HTC methods by a large margin.

Model

model.py: The main model of HiLAP.

TextCNN.py: Our implementation of "Convolutional Neural Networks for Sentence Classification" EMNLP 2014.

OHCNN(_fast).py: Our implementation of "Effective Use of Word Order for Text Categorization with Convolutional Neural Networks" NAACL 2015.

HAN.py: Our implementation of "Hierarchical Attention Networks for Document Classification" NAACL 2016.

HMCN.py: Our implementation of "Hierarchical Multi-Label Classification Networks" ICML 2018.

Requirements

Python 3

PyTorch 0.3

Data

Due to copyright issues, we can't directly release the datasets used in our experiments. Instead, we provide the links to the five data sources (the first two may require license):

RCV1 original release, text data (update: download the text data and convert to docs.txt with format "docid content")
NYT
Yelp (update: the latest release is different from what we used, pls send an email if you need the version we used)
FunGO

Please check readData_*.py to see how to use our scripts to process and generate the datasets from the original data.

Run

All the parameters in conf.py have default values. Change parameters mode, base_model, and dataset and then run main.py to train or test on different settings. To test a model, set load_model=model_file & is_Train=False in conf.py and run main.py.

Cite

@inproceedings{mao-etal-2019-hierarchical,
    title = "Hierarchical Text Classification with Reinforced Label Assignment",
    author = "Mao, Yuning  and
      Tian, Jingjing  and
      Han, Jiawei  and
      Ren, Xiang",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
    url = "https://www.aclweb.org/anthology/D19-1042",
    doi = "10.18653/v1/D19-1042",
    pages = "445--455",
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
fig		fig
yelp		yelp
HAN.py		HAN.py
HMCN.py		HMCN.py
Linear_Model.py		Linear_Model.py
Logger_morning.py		Logger_morning.py
OHCNN.py		OHCNN.py
OHCNN_fast.py		OHCNN_fast.py
README.md		README.md
TextCNN.py		TextCNN.py
clean_runs.py		clean_runs.py
conf.py		conf.py
feature_dataset.py		feature_dataset.py
features_test.py		features_test.py
loadData.py		loadData.py
main.py		main.py
model.py		model.py
readData_fungo.py		readData_fungo.py
readData_nyt.py		readData_nyt.py
readData_rcv1.py		readData_rcv1.py
readData_yelp.py		readData_yelp.py
tree.py		tree.py
util.py		util.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Abstract

Model

Requirements

Data

Run

Cite

About

Releases

Packages

Languages

morningmoni/HiLAP

Folders and files

Latest commit

History

Repository files navigation

Abstract

Model

Requirements

Data

Run

Cite

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages