Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rnazCluster.pl clusters windows with non-similar structures #12

Open
halabikeren opened this issue Mar 16, 2022 · 3 comments
Open

rnazCluster.pl clusters windows with non-similar structures #12

halabikeren opened this issue Mar 16, 2022 · 3 comments
Assignees
Labels
waiting waiting for response

Comments

@halabikeren
Copy link

Hi dear team,

I performed RNAz prediction on an alignment of viral genomes of a specific viral species. To this aim, I first divided the alignment to windows using rnazWindow.pl, then perform RNAz prediction, and then clustered significant structures of overlapping windows using rnazCluster. While I expected only similar structures of overlapping windows to be clustered, when I plot them with RNAplot, I see that they are, at least by visualization, quite different.
Please find an attached example: rnazCluster_example.zip

Additional info: I am using:
RNAz version 2.1
RNAplot 2.5.0

Am I doing something wrong or expecting rnazCluster.pl to do something it is not supposed to?

Many thanks!

@svenderheld svenderheld self-assigned this Mar 16, 2022
@svenderheld
Copy link
Collaborator

Dear halabikeren,

here is part of the description you get when you call 'rnazCluster.pl --man'

"rnazCluster.pl" reads RNAz output files and combines hits in
overlapping windows to ``loci". It prints a summary of the windows
and/or loci as a tabulator delimited text to the standard output. An
explanation of the fields can be found below. See the user manual for a
more detailed meaning of these values.

Hence, it does not take care of any structural features and simply combines overlapping windows by coordinates to larger loci. If you want to get the overall structure of a locus you could run for instance RNAlifold using a respective alignment of the complete locus. Such an alignment is not part of the RNAz output and might need additional work. Please be aware that structure prediction becomes the inaccurate the longer the input is.

Hope that helps?!

Best,
Sven

@svenderheld svenderheld added the waiting waiting for response label Mar 16, 2022
@halabikeren
Copy link
Author

Thank you Sven!

In this case, if I want to obtain the set of secondary structures within a genome, would you recommend that I work directly with RNAlifold, or is there are more accurate alternative (e.g., use non-overlapping windows of varying sizes with RNAz and filter the results based on a strict RNAz functional structures class probability cutoff?)

Thanks again!
Keren

@svenderheld
Copy link
Collaborator

Dear Keren,

if you (think to) know the boundaries of your transcript I recommend to use blat or blat to find homologs and build the full alignment based on that. Of course you could also use the boundaries of the RNAz loci During the processing of the RNAz framework sequences originally in the MAF alignment might be filtered and are not in the finally scored window. You might therefore also get a more complete picture. Depending on the sequences at hand the one or the other alignment approach might be superior to others. If you assume to look at an set of conserved structured RNAs you could use an sequence structure alignment approach, e.g. of the group in Freiburg [1], or you use for instance rcoffee [2].

I actually do not recommend to play around to much with the window sizes because RNAz was trained on alignments of size 120 (if I'm not mistaken) and to the best of my knowledge the effect of changing window size has been not tested so far.

Sorry for the delay but I hope it still helps.

Best,
Sven

[1] https://rna.informatik.uni-freiburg.de/
[2] https://tcoffee.org/Projects/rcoffee/index.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting waiting for response
Projects
None yet
Development

No branches or pull requests

2 participants