Data files (available at data.world)
- Raw text:
br08302.keg
- Tidy CSV:
usp_drug_classification.csv
Data was directly downloaded from KEGG,
(click on Download htext
in the upper left-hand corner) no scraping needed.
- Tidying script:
usp_drug_classification_tidying_script.py
- string: a sequence of characters
Name | Type | Description |
---|---|---|
usp_category | string | USP Category |
usp_class | string | USP Class |
usp_drug | string | The general drug (e.g. Naproxen, Ibuprofen, etc) |
drug_example | string | The specific drug (e.g. Naproxen sodium, Ibuprofen arginine salt, etc) |
kegg_id_drug | string | The KEGG identifier for the usp_drug (e.g. DG00245) |
kegg_id_drug_example | string | The KEGG identifier for the drug_example (e.g. D01122) |
nomenclature | string | (Unparsed) nomenclature description (e.g. '(JP17/USP/INN)' ) |
From the US Pharmacopeial Convention's website: the USP Drug Classification system (USP DC) is an independent drug classification system currently under development by the USP Healthcare Quality Expert Committee. The USP DC is designed to address stakeholder needs emerging from the extended use of the USP Medicare Model Guidelines (USP MMG) beyond the Medicare Part D benefit.
The USP DC is intended to be complementary to the USP MMG and is developed with similar guiding principles, taxonomy, and structure of the USP Categories and Classes.
The raw data was downloaded from the KEGG website: http://www.genome.jp/kegg-bin/get_htext?htext=br08302.keg
br08302.keg
is a text file with the hierarchical USP drug classifications. Lines beginning with A
are
USP Categories, subsequent lines beginning with B
are USP Classes in that category, lines beginning
with C
are the drugs
(i.e. the general drug compound), and lines beginning with D
are example_drugs
(i.e. the medication or formulation you would buy) of that drug.
According to the [guidelines](http://www.usp.org/sites/default/files/usp_pdf/EN/healthcareProfessionals/2016_usp_mmg_guiding_pri\ nciples.pdf), a USP Category is the broadest classification which provides a high level formulary structure designed to include all potential therapeutic agents for diseases and conditions. A USP Class is a more granular classification, occurring within a specific USP Category in the USP Drug Classifications, which provides for therapeutic or pharmacologic groupings of FDA approved medications, consistent with current U.S. healthcare practices and standards of care.
From what I understand, the nomenclature string indicates to which official nomenclature system that name belongs to. For example, "a British Approved Name (BAN) is the official non-proprietary or generic name given to a pharmaceutical substance, as defined in the British Pharmacopoeia (BP)" whereas "United States Adopted Names are unique nonproprietary names assigned to pharmaceuticals marketed in the United States."
I'm not sure of the best way to store this information (or how useful it will be), so for now the nomunclature strings are in the tidy data unparsed.
-
TBD whether this data includes medications covered by Part D Medicare or if it is only complementary to that data.
-
Note that the individual KEGG pages (e.g. D00903) for these drugs have a wealth of information, including product and generic names, chemical formula, additional classes, ATC codes, biochemical information, other classifications, and links to the compound in other databases (e.g. PubChem, DrugBank, etc).