Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding USDA genotype and lineage #80

Open
jameshadfield opened this issue Jul 26, 2024 · 0 comments
Open

Adding USDA genotype and lineage #80

jameshadfield opened this issue Jul 26, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@jameshadfield
Copy link
Member

jameshadfield commented Jul 26, 2024

Here are some notes after trying out USDA's GenoFLU tool to assign lineages (per-segment) and genotypes (per-genome):

This tool uses BLAST to identify North American H5NX genomes in the 2.3.4.4b clade from a curated database. Pre-defined genotypes are cross-referenced with the top segment identifications, and a genotype is assigned.

The tool is slightly inconvenient to use in the context of our pipeline as it requires a single fasta file per strain (genome), so we'd have to either create these on the fly, run GenoFLU, extract the results, and delete the temporary files or modify the tool to be more ergnomic for our usage. I couldn't install it via conda but used the provided docker image.

Example usage

mkdir results/genoflu
cd results/genoflu

# create a FASTA file for a specific strain
echo 'A/muteswan/Austria/23169070001/2023' > id.txt
SEGMENTS=("pb2" "pb1" "pa" "ha" "np" "na" "mp" "ns")
echo > data.fasta
for s in ${SEGMENTS[@]}; do
    seqkit grep -nf id.txt ../../data/gisaid/sequences_${s}.fasta | seqkit replace -p '$' -r "/${s}" >> data.fasta
done;

# run GenoFLU
docker container run --rm -it --mount type=bind,src=.,target=/avian-flu \
    quay.io/biocontainers/genoflu:1.03--hdfd78af_0 \
    bash -c "cd avian-flu/results/genoflu && genoflu.py -f data.fasta"

Sample results:

A/muteswan/Austria/23169070001/2023
Genotype: Not assigned: Only 4 segments >98% match found of total 8 segments in input file
Lineages: PB1:ea3, HA:ea3, NP:ea6, MP:ea3

A/carrioncrow/Hokkaido/B081/2024/HA
Genotype: A3
Lineages: PB2:ea3, PB1:ea3, PA:ea3, HA:ea3, NP:ea3, NA:ea3, MP:ea3, NS:ea3

A/Dairycattle/Kansas/5/202 (NCBI)
Genotype: B3.13
Lineages: PB2:am2.2, PB1:am4, PA:ea1, HA:ea1, NP:am8, NA:ea1, MP:ea1, NS:am1.1
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant