-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seqlet calling issues #24
Comments
For
Will be available next release, hopefully next week. |
I've tried making a reproducible example.
Generate 50 random sequences of length 100 with random attributions in the interval [-1,1)
Call recursive seqlets:
The start and end are supposed to be bed-formatted, meaning end is not included, and indeed this agrees with I would expect the results to be invariant to reversing the attributions along the length of the sequence:
However, one seqlet is found in a different sequence, and the one in sequence 9 is shifted by 1.
This hints at some left-right asymmetri.
The peak of the 'motif' is at positions 19 and 20. We thus have 3 positions to the left of the peak and 6 positions to the right of the peak.
When we plot it, we even see the seqlet extending 4 positions further to the right than to the left. I think the extra position we observe when plotting is due to plot_logo including the end position when plotting, which is misleading. Still, the algorithm appears to have a preference for extending the motif to the right. Another artefact is the fact that increasing min_seqlet_len can produce more seqlets, which is a bit counterintuitive. This probably derives from the fact that only subseqlets down to min_seqlet_len needs to have a p-value below the threshold. |
Thank you for adding functionality for calling seqlets, this is really useful. I encountered a few issues when experimenting with the new functions.
With
tfmodisco_seqlets
:When setting
flank=0
no seqlets are returned. I think it's because of this line, which sets all entries equal to negative infinity:X_sum[:, -flank:] = -numpy.inf
The device argument lacks an explanation. I got an error because some tensors were on cpu and some on cuda with the default
device='cuda'
. Also, it's not possible to put the input tensor on cuda, so it seems the function only works withdevice='cpu'
.With
recursive_seqlets
:It seems that returned seqlets are skewed to the right, i.e. often extending beyond the core 'motif' on the right side, but not on the left side. This is most apparent when using a rather large value for
min_seqlet_len
, such as 10. It could be related to this line:end = min(end + min_seqlet_len + additional_flanks - 1, l)
where
min_seqlet_len
is added toend
.With
plot_logo
:plot_logo
fails with seqlets if there are more annotations thatn_tracks
andshow_extra
is true,since
motif = row.values[0]
is a float and so has no length.The text was updated successfully, but these errors were encountered: