OnSIDES v2.0.0
Second major release of OnSIDES code to extract adverse reactions and boxed warnings from the FDA structured product labels (SPLs). This version contains significant model improvements as well as updated labels. All labels available to download from DailyMed as of November 10, 2022 were processed in this analysis. In total 2.8 million adverse reactions were extracted from over 45,000 labels for just under 2,000 drug products (single agents or combinations).
OnSIDES was created using the PubMedBERT language model and 200 manually curated labels available from Denmer-Fushman et al.. The model achieves an F1 score of 0.90, AUROC of 0.92, and AUPR of 0.95 at extracting effects from the ADVERSE REACTIONS section of the label. This constitutes an absolute increase of 4% in each of the performance metrics over v1.0.0. For the BOXED WARNINGS section, the model achieves a F1 score of 0.71, AUROC of 0.85, and AUPR of 0.72. This constitutes an absolute increase of 10-17% in the performance metrics over v1.0.0. Compared against the TAC reference standard using the official evaluation script the model achieves a Micro-F1 score of 0.87 and a Macro-F1 of 0.85.
The model checkpoints for the Adverse Reactions (AR) and Boxed Warnings (BW) sections are also provided as part of this release as pth files.
For the data check for the latest "Data Release" under releases.