Skip to content
/ LMRank Public

LMRank: Utilizing Pre-Trained Language Models and Dependency Parsing for Keyphrase Extraction

License

Notifications You must be signed in to change notification settings

NC0DER/LMRank

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python-Versions Software-License Open In Colab License: CC BY 4.0

LMRank

This repository hosts code for the paper:

About

LMRank is a keyphrase extraction approach, that builds on recent advancements in the fields of Keyphrase Extraction and Deep learning. Specifically, it utilizes dependency parsing, a technique which forms more coherent candidate keyphrases, as well as highly accurate sentence-transformers models to semantically compare the candidate keyphrases with the text and extract the most relevant ones.

If you have any practical or research questions take a quick look at the FAQ. As shown in the FAQ, LMRank currently supports 14 languages including English, Greek and others.

Installation

pip install git+https://github.com/NC0DER/LMRank/

Example

from LMRank.model import LMRank

text = """
      Machine learning (ML) is a field of inquiry devoted to understanding and building 
      methods that 'learn', that is, methods that leverage data to improve performance 
      on some set of tasks.[1]  It is seen as a part of artificial intelligence. Machine 
      learning algorithms build a model based on sample data, known as training data, 
      in order to make predictions or decisions without being explicitly programmed 
      to do so.[2] Machine learning algorithms are used in a wide variety of 
      applications, such as in medicine, email filtering, speech recognition, agriculture, 
      and computer vision, where it is difficult or unfeasible to develop conventional 
      algorithms to perform the needed tasks.[3][4] A subset of machine learning is closely 
      related to computational statistics, which focuses on making predictions using computers.
 """
model = LMRank()
results = model.extract_keyphrases(text, language_code = 'en', top_n = 10)

print(results)

Results

[('conventional algorithms', 0.03220074744562463),
 ('machine learning', 0.0320379078219184),
 ('training data', 0.02651275416153127),
 ('artificial intelligence', 0.023564133570545886),
 ('computational statistics', 0.018363250279455255),
 ('speech recognition', 0.017827318362436336),
 ('computer vision', 0.017721180700768415),
 ('data', 0.01647833767159313),
 ('sample data', 0.014187748325602852),
 ('predictions', 0.014133139194664955)]

To see a list of supported languages and their codes, see the FAQ.

Citation

Please use the following BibTeX entry to cite LMRank if you use it in your research work:

@article{giarelis2023lmrank,
  title={LMRank: Utilizing pre-trained language models and dependency parsing for keyphrase extraction},
  author={Giarelis, Nikolaos and Karacapilidis, Nikos},
  journal={IEEE Access},
  year={2023},
  publisher={IEEE}
}

Contributors