You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
During a sanity check of this data I noticed that quite a lot of the training examples have identical sequences, but with different PSSM and entropy. The coordinates for these duplicates are also not identical, even under translation/rotation, though the one example I actually plotted after matching the coordinates under translation and rotation, had coordinates that we close to identical, but deviated in a few places.
It should be noted that this problem is not just in the training_100 data, but actually also extends into the training_95 data. I find this very surprising since I would have expected the clustering to at the very least group/remove identical sequences.
During a sanity check of this data I noticed that quite a lot of the training examples have identical sequences, but with different PSSM and entropy. The coordinates for these duplicates are also not identical, even under translation/rotation, though the one example I actually plotted after matching the coordinates under translation and rotation, had coordinates that we close to identical, but deviated in a few places.
See the attached example (it was too long to paste in here)
identical_sequences.zip
Other training examples were repeated 6 times in the data.
Is there any good reason for this or is this an error?
The text was updated successfully, but these errors were encountered: