Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence depth normalization after woltka classification #211

Open
UniAlberta opened this issue Oct 7, 2024 · 4 comments
Open

Sequence depth normalization after woltka classification #211

UniAlberta opened this issue Oct 7, 2024 · 4 comments

Comments

@UniAlberta
Copy link

Hi there, Im using woltka classify on metagenomic short reads. I got the output as RPK which gave me the copy number of each gene in each sample. Although, Id like to normalize the sequencing depth in each sample as well. Is there any command to use for normalizing sequence depth on the RPK output table?

Thanks

@qiyunzhu
Copy link
Owner

qiyunzhu commented Oct 7, 2024

Hello @UniAlberta Yes you can do that. See the instruction. Basically, you need:

woltka normalize -i rpk.biom --scale 1M -o tpm.biom

@UniAlberta
Copy link
Author

UniAlberta commented Oct 8, 2024

Hi and thanks for your reply. So you are suggesting to convert RPK to TPM?
I use this workflow to create RPK outputs in tsv format.
#####woltka classify
--input $SAM_INPUT_DIR/
--coords $DB_DIR/proteins/coords.txt
--map $DB_DIR/function/uniref/uniref.map.xz
--names $DB_DIR/function/uniref/uniref.name.xz
--names $DB_DIR/function/kegg/ko.name
--map $DB_DIR/function/kegg/ko.map.xz
--rank uniref,ko
--sizes .
--scale 1k
--digits 3
--to-tsv
--output $OUTPUT_DIR

How can I add "woltka normalize -i rpk.biom --scale 1M -o tpm.biom" to this workflow so that the final output is normalized by both gene length and sequence depth? I appreciate your help.

@qiyunzhu
Copy link
Owner

qiyunzhu commented Oct 8, 2024

@UniAlberta After running the command you posted, you will get two output files: uniref.tsv and ko.tsv. Then you can run the command I suggested on each of them, like woltka normalize -i uniref.tsv --scale 1M -o uniref.norm.tsv

@UniAlberta
Copy link
Author

Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants