Skip to content

tatsuya-takahashi/PubMed-API-Script

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 

Repository files navigation

PubMed-API-Script

API script with flexible query and simple EDA and analysis

What's this?

All code

You can see all code and viz at below link.
https://nbviewer.jupyter.org/github/tatsuya-takahashi/PubMed-API-Script/blob/master/PubMed.ipynb

How to use

  • You can set your own parameter to fetch PubMed data.
  • You can analyze data as you like. This sample indicates simple EDA.
# const
BASEURL_INFO = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi'
BASEURL_SRCH = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'
BASEURL_FTCH = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi'

# parameters
SOURCE_DB    = 'pubmed'     #pubmed, nuccore, nucest, nucgss, popset, protein
TERM         = 'cancer'     # Entrez text query. 
DATE_TYPE    = 'pdat'       # Type of date used to limit a search. The allowed values vary between Entrez databases, but common values are 'mdat' (modification date), 'pdat' (publication date) and 'edat' (Entrez date). Generally an Entrez database will have only two allowed values for datetype.
MIN_DATE     = '2018/01/01' # yyyy/mm/dd
MAX_DATE     = '2019/12/31' # yyyy/mm/dd
SEP          = '|'
BATCH_NUM    = 1000

# seed
RANDOM_STATE = 42
np.random.seed(RANDOM_STATE)

What's output?

pubmed_article.csv

PMID
JournalTitle
Title
doi
Abstract
Language
Year_A
Month_A
Day_A
Year_PM
Month_PM
Day_PM
Status
MeSH
MeSH_UI KeywordI
Keyword

pubmed_author.csv

authorId
PMID
name
identifier
identifierSource

pubmed_affiliation.csv

authorId
affiliation

About

API script with flexible query and simple EDA and analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published