COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further, it helps to have some additional tools…
{pangoRo} is an R package to support interacting with PANGO lineage information. The core functionality was inspired by a similar package called pango_aliaser created by Cornelius Roemer for the Python language.
You can install {pangoRo} from GitHub:
remotes::install_github('al-obrien/pangoRo')
The basic usage of {pangoRo} is to expand, collapse, and sort COVID-19 lineages. Start by creating the pangoro object that links to the latest (or cached) PANGO reference. This is then passed to subsequent operations as reference.
library(pangoRo)
# Create pangoro object
my_pangoro <- pangoro()
#> Loading alias table from PANGO webiste...
With a vector of PANGO lineages, provide fully collapsed output.
# Vector of COVID-19 lineages to collapse
cov_lin <- c('B.1.617.2', 'BL.2', 'B.1.1.529.2.75.1.2', 'BA.2.75.1.2', 'XD.1')
# Collapse lineage names as far as possible
collapse_pangoro(my_pangoro, cov_lin)
#> [1] "B.1.617.2" "BL.2" "BL.2" "BL.2" "XD.1"
Can also define how far to collapse each input.
collapse_pangoro(my_pangoro, cov_lin, max_level = 1)
#> [1] "B.1.617.2" "BL.2" "BA.2.75.1.2" "BL.2" "XD.1"
# Vector of COVID-19 lineages to expand
cov_lin <- c('B.1.617.2', 'B.1.617.2.6', 'AY.4', 'AY.39', 'BL.2', 'BA.1', 'AY.2', 'XD.1')
# Expand lineage names as far as possible
exp_lin <- expand_pangoro(my_pangoro, cov_lin)
exp_lin
#> B.1.617.2 B.1.617.2.6 AY.4
#> "B.1.617.2" "B.1.617.2.6" "B.1.617.2.4"
#> AY.39 BL.2 BA.1
#> "B.1.617.2.39" "B.1.1.529.2.75.1.2" "B.1.1.529.1"
#> AY.2 XD.1
#> "B.1.617.2.2" "XD.1"
Perform a pseudo-sort on the lineage names.
# Sort lineages
sort_pangoro(my_pangoro, exp_lin)
#> BA.1 BL.2 B.1.617.2
#> "B.1.1.529.1" "B.1.1.529.2.75.1.2" "B.1.617.2"
#> AY.2 AY.4 B.1.617.2.6
#> "B.1.617.2.2" "B.1.617.2.4" "B.1.617.2.6"
#> AY.39 XD.1
#> "B.1.617.2.39" "XD.1"
Split the lineages by their lowest alias codes and sort within each grouping
collapsed_full <- collapse_pangoro(my_pangoro, cov_lin, aliase_parent = TRUE)
grps <- split(collapsed_full, sapply(strsplit(collapsed_full, split = '\\.'), `[[`, 1))
lapply(grps, function(x) sort_pangoro(my_pangoro, x))
#> $AY
#> [1] "AY" "AY.2" "AY.4" "AY.6" "AY.39"
#>
#> $BA
#> [1] "BA.1"
#>
#> $BL
#> [1] "BL.2"
#>
#> $XD
#> [1] "XD.1"
Although initial recombinant variants are typically obvious based upon their X prefix, their children may not be (e.g. EG.1).
is_recombinant(my_pangoro,
c('EG.1', 'EC.1', 'BA.1', 'XBB.1.9.1.1.5.1', 'B.1.529.1'))
#> [1] TRUE FALSE FALSE TRUE FALSE