strand information for MoSEA not using SUPPA2 events #6

JamalEH · 2020-03-25T12:16:00Z

Dear team,
May this email finds you all fine.

I have a question related to using MoSEA using coordinates from a tool different than SUPPA2.
my question is should I provide the strand information when extracting the sequences ? My bed file looks lik the following:
chr18 63127035 63128759 BCL2_E1
chr18 63123346 63127034 BCL2_E2
chr18 63126835 63127035 BCL2_U1
chr18 63126834 63127034 BCL2_U2

Providing the above file to I got the sequences. When I scan the sequences for the occurrence of the RBP binding motif i got something like:
#pattern name sequence name start stop strand score p-value q-value matched sequence
HNRNPL_00091 BCL2_E1 794 800 + 8.13415 0.000743 ACACAAT
HNRNPL_00091 BCL2_E1 1260 1266 + 10.0671 7.11e-05 ACACGAA
HNRNPL_00091 BCL2_E2 87 93 + 10.0549 0.000159 ACACAAA
HNRNPL_00091 BCL2_E2 1122 1128 + 9.96951 0.000413 ACACAAG
HNRNPL_00091 BCL2_E2 1426 1432 + 8.2378 0.000536 ACACCAC
HNRNPL_00091 BCL2_E2 1877 1883 + 7.56098 0.000996 ACACAGA

Looking at the strand column the tool reports a sequence on the positive strand, while my gene BCL2 is on the reverse strand "ensembl location: Chromosome 18: 63,123,346-63,320,128 reverse strand".
I'm using the hg38 genome assembly to extract sequences.

I will be very thankful if you can help me to fix this issue.
Thank you so much in advance!
Kind regards,
Jamal.

EduEyras · 2020-03-25T12:19:58Z

Hi Jamal, Yes, please do include the strand, so that the sequence is correctly extracted. It may be assuming that strand is positive if you do not specify it. All fine here so far. I hope you too. Stay safe! Thanks E.

…

On Wed, 25 Mar 2020 at 23:16, JamalEH ***@***.***> wrote: Dear team, May this email finds you all fine. I have a question related to using MoSEA using coordinates from a tool different than SUPPA2. my question is should I provide the strand information when extracting the sequences ? My bed file looks lik the following: chr18 63127035 63128759 BCL2_E1 chr18 63123346 63127034 BCL2_E2 chr18 63126835 63127035 BCL2_U1 chr18 63126834 63127034 BCL2_U2 Providing the above file to I got the sequences. When I scan the sequences for the occurrence of the RBP binding motif i got something like: #pattern name sequence name start stop strand score p-value q-value matched sequence HNRNPL_00091 BCL2_E1 794 800 + 8.13415 0.000743 ACACAAT HNRNPL_00091 BCL2_E1 1260 1266 + 10.0671 7.11e-05 ACACGAA HNRNPL_00091 BCL2_E2 87 93 + 10.0549 0.000159 ACACAAA HNRNPL_00091 BCL2_E2 1122 1128 + 9.96951 0.000413 ACACAAG HNRNPL_00091 BCL2_E2 1426 1432 + 8.2378 0.000536 ACACCAC HNRNPL_00091 BCL2_E2 1877 1883 + 7.56098 0.000996 ACACAGA Looking at the strand column the tool reports a sequence on the positive strand, while my gene BCL2 is on the reverse strand "ensembl location: Chromosome 18: 63,123,346-63,320,128 reverse strand". I'm using the hg38 genome assembly to extract sequences. I will be very thankful if you can help me to fix this issue. Thank you so much in advance! Kind regards, Jamal. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#6>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB3RBMLZF2B2URQW6PLRJHYY5ANCNFSM4LTNAXCQ> .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

JamalEH · 2020-03-25T12:25:43Z

Dear Eduardo,
Thank you so much for your prompt reply!

When including the strand information like this:
chr18 63127035 63128759 - BCL2_E1
chr18 63123346 63127034 - BCL2_E2
chr18 63126835 63127035 - BCL2_U1
chr18 63126834 63127034 - BCL2_U2

The tool when reporting the sequence considers the strand column as the name of the corrdinates, I reports something like:

TATCAACCACAGCATTAAACATTGAACAGAGTACATTCCAAAGTTAATACAGATAAATGGTATATAATGCAATAATGCCACAGAGTTATTCCATCAATGTTTCAAGGCTGATTCTAAACTGGAAGAAAAAAAAATTTCCTAGTTTATTTGCTGAAGATGTCACTTCTTTTGTTACTTCTTTATAGTTCCCCACCATTGAT

ATATCAACCACAGCATTAAACATTGAACAGAGTACATTCCAAAGTTAATACAGATAAATGGTATATAATGCAATAATGCCACAGAGTTATTCCATCAATGTTTCAAGGCTGATTCTAAACTGGAAGAAAAAAAAATTTCCTAGTTTATTTGCTGAAGATGTCACTTCTTTTGTTACTTCTTTATAGTTCCCCACCATTGA

Should I add a header and then adjust the getfasta command to force it consider the header of the file? or the column with strand information should be added in a different place?

getfasta command is run with -s option which reverse complement the sequence if located on negative strand.

Thank you again!
Kind regards,
Jamal.

JamalEH · 2020-03-25T12:26:57Z

dots are just negative sign "-". They changed after sending the message. Sorry for that

EduEyras · 2020-03-25T12:42:53Z

Hi Jamal, I am not sure about that technical detail, that's a question for Babita I would assume that the format should be one of the standards, probably BED format, described here: https://genome.ucsc.edu/FAQ/FAQformat.html#format1 E.

…

On Wed, 25 Mar 2020 at 23:25, JamalEH ***@***.***> wrote: Dear Eduardo, Thank you so much for your prompt reply! When including the strand information like this: chr18 63127035 63128759 - BCL2_E1 chr18 63123346 63127034 - BCL2_E2 chr18 63126835 63127035 - BCL2_U1 chr18 63126834 63127034 - BCL2_U2 The tool when reporting the sequence considers the strand column as the name of the corrdinates, I reports something like: - TATCAACCACAGCATTAAACATTGAACAGAGTACATTCCAAAGTTAATACAGATAAATGGTATATAATGCAATAATGCCACAGAGTTATTCCATCAATGTTTCAAGGCTGATTCTAAACTGGAAGAAAAAAAAATTTCCTAGTTTATTTGCTGAAGATGTCACTTCTTTTGTTACTTCTTTATAGTTCCCCACCATTGAT - ATATCAACCACAGCATTAAACATTGAACAGAGTACATTCCAAAGTTAATACAGATAAATGGTATATAATGCAATAATGCCACAGAGTTATTCCATCAATGTTTCAAGGCTGATTCTAAACTGGAAGAAAAAAAAATTTCCTAGTTTATTTGCTGAAGATGTCACTTCTTTTGTTACTTCTTTATAGTTCCCCACCATTGA Should I add a header and then adjust the getfasta command to force it consider the header of the file? or the column with strand information should be added in a different place? getfasta command is run with -s option which reverse complement the sequence if located on negative strand. Thank you again! Kind regards, Jamal. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#6 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ADCZKB2BVSR7YDWFOQ43T6DRJHZ5LANCNFSM4LTNAXCQ> .

-- Prof. E Eyras EMBL Australia Group Leader The John Curtin School of Medical Research - Australian National University https://github.com/comprna http://scholar.google.com/citations?user=LiojlGoAAAAJ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

strand information for MoSEA not using SUPPA2 events #6

strand information for MoSEA not using SUPPA2 events #6

JamalEH commented Mar 25, 2020

EduEyras commented Mar 25, 2020 via email

Uh oh!

JamalEH commented Mar 25, 2020

Uh oh!

JamalEH commented Mar 25, 2020

Uh oh!

EduEyras commented Mar 25, 2020 via email

Uh oh!

strand information for MoSEA not using SUPPA2 events #6

strand information for MoSEA not using SUPPA2 events #6

Comments

JamalEH commented Mar 25, 2020

EduEyras commented Mar 25, 2020 via email

Uh oh!

JamalEH commented Mar 25, 2020

Uh oh!

JamalEH commented Mar 25, 2020

Uh oh!

EduEyras commented Mar 25, 2020 via email

Uh oh!