Keyphrase Extraction Algorithm

Keyphrase Extraction Algorithm based on Topic PageRank（TextRank， TPR， Single TPR， Salience Rank）

Chinese Keyphrase Extraction: https://github.com/JackHCC/Chinese-Keyphrase-Extraction

Introduction

Algorithm	Intro	ref
TextRank	Default PageRank	paper
TPR	Integrating topic into PageRank calculation for the first time	paper
Single TPR	Topic PageRank of single iteration calculation	paper
Salience Rank	PageRank with Salience， S(w) = (1 − α)CS(w) + αTS(w)	paper

Dependencies

nltk 3.6.1
matplotlib 3.3.4
networkx 2.5
numpy 1.20.1

Files

runner.py: executes the main function
ranks.py: implementation of various key phrase extraction algorithms
tagger.py: POS tagging infrastructure
utils.py: various utilities functions
process.py: infrastructure for dataset processing

DataSet

English

data: contains the two standard datasets Inspec (Hulth. 2003. Improved automatic keyword extraction given more linguistic knowledge) and 500N (Marujo et al. 2013. Supervised topical key phrase extraction of news stories using crowdsourcing, light filtering and co-reference normalization).
lda: The TPR and DR algorithms rely on two LDA output files (which can be obtained with any standard LDA implementation).
- Each line of lda-topicsXvocab*.txt contains the topic distribution over the vocabulary for each document (documents are sorted alphabetically by filename).
- Each line of lda-docxXtopics*.txt contains the proportion of each topic for each document (documents are sorted alphabetically by filename).
results: the results for the two datasets are output here after executing runner.py

Usage

python runner.py

Reference

Text Rank: Mihalcea and Tarau. 2004. Textrank: Bringing order into texts.
TPR: Liu et al. 2010. Automatic keyphrase extraction via topic decomposition.
Single TPR: Sterckx et al. 2015. Topical word importance for fast keyphrase extraction.
Salience Rank: Nedelina et al . 2017.Salience Rank: Efficient Keyphrase Extraction with Topic Modeling.
https://github.com/zhengfeitian/saliencerank

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Keyphrase Extraction Algorithm

Introduction

Dependencies

Files

DataSet

English

Usage

Reference

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
lda		lda
.gitignore		.gitignore
README.md		README.md
process.py		process.py
ranks.py		ranks.py
runner.py		runner.py
tagger.py		tagger.py
utils.py		utils.py

JackHCC/Keyphrase-Extraction

Folders and files

Latest commit

History

Repository files navigation

Keyphrase Extraction Algorithm

Introduction

Dependencies

Files

DataSet

English

Usage

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages