Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing documentation #16

Open
susheelbhanu opened this issue Jul 19, 2024 · 1 comment
Open

Missing documentation #16

susheelbhanu opened this issue Jul 19, 2024 · 1 comment

Comments

@susheelbhanu
Copy link

Hi,

I'm trying to use this tool but it's unclear where the names and nodes files comes from in the documentation.

$ metaDMG config raw_data/alignment.sorted.bam \
    --names raw_data/names-mdmg.dmp \
    --nodes raw_data/nodes-mdmg.dmp \
    --acc2tax raw_data/acc2taxid.map.gz \
    --custom-database

Does the tool provide or download it automatically?

Thanks,
Susheel

@FranckLejzerowicz
Copy link

Hi Susheel,

I figured that if you follow what's done in the Tutorial, you can see that the files distributed with metaDMG cannot be necessarily used for your own data.

This tutorial is not very explicit, but then it makes sense to peek into the files downloaded with metaDMG get-data --output-dir raw_data:

Indeed, in raw_data/acc2taxid.map.gz, you can discover that what the metaDMG-cpp will use to lookup taxids and calculate the LCA are likely the "chr" listed as accession. Likewise, these accession are the targets onto which the reads were mapped.

For example, if you download these files using metaDMG get-data, you can see that is matches and there is a taxid for a given mapped-onto genome.

$ samtools view alignment.sorted.bam | grep GCA_000007325.1 | wc -l
581
$ zcat acc2taxid.map.gz | grep GCA_000007325.1
GCA_000007325.1	GCA_000007325.1	21768	2

The thing is, if you use a custom database and not NCBI genomes, that you have to make sure you make matching contents in files passed to --names, --nodes and --acc2tax. For each of my metagenomes' contigs, I'll be pulling taxids from the GOs of the majority of genes annotated using eggnog-mapper, and make sure the taxids are themselves pulled/referenced is an NBCI taxdump (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/).

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants