Skip to content

Library to download PubMed abstracts with metadata. Originally created to obtain the DrugProt (BioCreative VII) background set

Notifications You must be signed in to change notification settings

tonifuc3m/pubmed-parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

biocreative-background-set-pipeline Logo

Library to download PubMed abstracts with metadata. Originally created to obtain the DrugProt (BioCreative VII) background set.
Explore the docs »

Requirements

Usage

It has 2 modes:

  • get_pmids mode. This mode is intented to be used when we have a set of PubMed queries and we want to extract the PMIDs that match them. It returns a list (or several lists) of PMIDs.
  • fetch mode. This mode receives a list of PMIDs and downloads the PubMed titles and abstracts together with their metadata. It returns a tab-separated file (or several ones) with PMID, title, abstract, PMC id, MeSH terms and language. It also stores the complete object downloaded from PubMed into a JSON file (or several files). The titles and abstracts have UFT-8 encoding, with NFKC Unicode normalization and all whitespaces are normalized (meaning, there are no tabs, new lines, etc).

To modify the execution mode, line 241 must be changed.

Usage in the get_pmids mode

  1. Go to line 241 and write
    mode = 'get_pmids'
  1. Execute python code
python get_background.py -i /path/ -o toy-data/queries-example --logfile ~/outfolder/log.log

Script Arguments

  • -i: directory where the pubmed queries file is. It is also the directory where the output will be created
  • -o: name of file with pubmed queries
  • --logfile: path to logfile

Usage in the fetch mode

  1. Go to line 241 and write
    mode = 'fetch'
  1. Execute python code
python get_background.py --input toy-data/pmids-example --output ~/outfolder --logfile ~/outfolder/log.log

Script Arguments

  • --input: text file with the list of PMIDs you want to download, one per line
  • --output: folder where we will store the output
  • --logfile: path to logfile

Report Bug · Request Feature

About

Library to download PubMed abstracts with metadata. Originally created to obtain the DrugProt (BioCreative VII) background set

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages