No filtering was done! Exiting... #2

mictadlo · 2021-06-11T23:36:22Z

Hi,
Thank you for sharing your cafe pipeline. However, I ran into the following problems:

> awk -F'\t' '{print "(null)\t"$0}' Orthogroups.GeneCount.tsv > tmp.tsv
> sed -i.bak 's|(null)|Desc|g' tmp.tsv

> head tmp.tsv
Desc	Orthogroup	Atha	Cann	Csat	Mgut	Natt	Nlab	Nobt	Noto	Nqld	Nsyl	Ntab	Ntom	Ptri	Slyc	Smel	Stub	Vvin	Total
Desc	OG0000000	0	0	0	0	0	3	0	0	982	0	0	1	0	0	0	986
Desc	OG0000001	0	9	0	0	5	23	8	59	3	31	267	14	0	6	113	540

> python2.7 ~/scripts/cafe/cafetutorial_clade_and_size_filter.py -i tmp.tsv -s -o cafe.input.tsv
No filtering was done! Exiting...

> conda activate cafe
> cafe
# load -i tmp.tsv reports/log_run1.txt
Failed to load file

What did I miss?

Thank you in advance

The text was updated successfully, but these errors were encountered:

harish0201 · 2021-06-12T04:37:21Z

Ah! It's been some time that I've looked at this, so apologies for the confusion ;)

In the tmp.tsv, as per CAFE's format, the first column should be the (null) with the header being Desc. So what you've shared should look like:

head tmp.tsv
Desc	Orthogroup	Atha	Cann	Csat	Mgut	Natt	Nlab	Nobt	Noto	Nqld	Nsyl	Ntab	Ntom	Ptri	Slyc	Smel	Stub	Vvin	Total
(null)	OG0000000	0	0	0	0	0	3	0	0	982	0	0	1	0	0	0	986
(null)	OG0000001	0	9	0	0	5	23	8	59	3	31	267	14	0	6	113	540

IIRC, this is what the awk portion actually does, only the "first column header" should be the "Desc".

Once this is done, you can run the cafetutorial_clade_and_size_filter.py script that CAFE has.

In any case, what it does is, it's going to remove the OGs/Groups if any of the species have more than 100 proteins in them.

Once this is done, you'd need to create a rooted tree. I recommend using Timetree.org or other age approaches (ape package in R is a good started)

Unfortunately, the way I've arranged the CAFE thing is essentially a guideline as to how I do it, rather than a complete pipeline because I kept running into issues :(

I might automate it some day, hopefully, but the use case for me is fairly too low to justify the time sunk into it.

mictadlo · 2021-06-12T23:21:44Z

Thank you. The new file looks like this:

> head tmp.tsv
Desc	Orthogroup	Atha	Cann	Csat	Mgut	Natt	Nlab	Nobt	Noto	Nqld	Nsyl	Ntab	Ntom	Ptri	Slyc	Smel	Stub	Vvin	Total
(null)	OG0000000	0	0	0	0	0	3	0	0	982	0	0	1	0	0	0	0	0	986
(null)	OG0000001	0	9	0	0	5	23	8	59	3	31	267	14	0	6	113	2	0	540
(null)	OG0000002	0	128	3	20	21	21	8	42	21	36	70	37	3	25	38	39	22	534
(null)	OG0000003	1	66	12	20	29	17	12	42	21	30	60	21	5	19	36	84	42	517

The age (28 MYA) has been provided to me by a collaborator. Unfortunately, I got a new error:

> make_ultrametric.py -r 28 SpeciesTree_rooted.txt
> head SpeciesTree_rooted.txt.ultrametric.tre
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);

python2.7 ~/scripts/cafe/cafetutorial_clade_and_size_filter.py -i tmp.tsv -s -o cafe.input.tsv
> cafe
# load -i cafe.input.tsv
-----------------------------------------------------------
Family information: cafe.input.tsv
Log: stdout
The number of families is 35104
Root Family size : 1 ~ 124
Family size : 0 ~ 149
P-value: 0.01
Num of Threads: 1
Num of Random: 1000
# tree (((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819)
No species 'total' was found in the tree

What did I miss?

Thank you in advance,

Michal

harish0201 · 2021-06-15T16:23:41Z

Hey!

You have to remove the "Total" column, that won't be needed. The error specifies that :)

mictadlo · 2021-06-15T22:56:06Z

Thank you,, I have forgotten to do awk -F'\t' '{$NF=""; print $0}' tmp.tsv | rev | sed 's/^\s*//g' | rev | tr ' ' '\t' > mod.tsv. Unfortunately, I got now Tree must be binary:

> awk -F'\t' '{$NF=""; print $0}' tmp.tsv | rev | sed 's/^\s*//g' | rev | tr ' ' '\t' > mod.tsv
> python2.7 ~/scripts/cafe/cafetutorial_clade_and_size_filter.py -i mod.tsv -s -o cafe.input.tsv  
>  cafe
# load -i cafe.input.tsv
-----------------------------------------------------------
Family information: cafe.input.tsv
Log: stdout
The number of families is 27493
Root Family size : 1 ~ 124
Family size : 0 ~ 149
P-value: 0.01
Num of Threads: 1
Num of Random: 1000
# tree (((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819);
(((Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727):3.92728,(Mgut:23.5518,((Cann:4.94278,((Stub:1.91372,Slyc:1.91372):1.99847,Smel:3.91219):1.0306):2.54854,((Natt:2.59231,((Nsyl:1.44399,Ntab:1.44399):0.826096,(Nlab:0.508663,Nqld:0.508663):1.76142):0.322218):0.530061,((Ntom:1.26249,Noto:1.26249):1.11482,Nobt:2.37732):0.745047):4.36896):16.0605):4.44819)
# lambda -s -t ((1,1)1,(1,1):1,1)
Tree must be binary

What did I miss?

Thank you in advance,

Michal

harish0201 · 2021-06-16T01:43:28Z

The lambda tree that you've pasted should look like the tree that you've generated (Ptri.....)

or rather taking a small portion of your ultrametric tree: (Ptri:21.5258,(Csat:19.4873,Atha:19.4873):2.03847):2.54693,Vvin:24.0727)

the lambda tree would look like: (1,(2,2)),1) assuming Csat is your species of interest. My recommendation is go through the CAFE tutorial once though, because it covers this in more detail.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No filtering was done! Exiting... #2

No filtering was done! Exiting... #2

mictadlo commented Jun 11, 2021

harish0201 commented Jun 12, 2021

mictadlo commented Jun 12, 2021

harish0201 commented Jun 15, 2021

mictadlo commented Jun 15, 2021

harish0201 commented Jun 16, 2021 •

edited

Loading

No filtering was done! Exiting... #2

No filtering was done! Exiting... #2

Comments

mictadlo commented Jun 11, 2021

harish0201 commented Jun 12, 2021

mictadlo commented Jun 12, 2021

harish0201 commented Jun 15, 2021

mictadlo commented Jun 15, 2021

harish0201 commented Jun 16, 2021 • edited Loading

harish0201 commented Jun 16, 2021 •

edited

Loading