-
-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get LCS string from LCSseq method #270
Comments
This implementation is based on the paper You should be able to retrieve the longest common subsequence string using the editops/opcodes functions:
blocks has the same format as the matching blocks in difflib. It can e.g. be used to get the subsequence strings:
|
Thank you @maxbachmann for your quick reply and providing info about the algorithm. |
There is no such option yet, but it is planned to add this feature to Levenshtein/DamerauLevenshtein/OSA: #241 Note when using different weights, it will need to use an |
On way is to treat weightage as '1' for all elements, and get LCS string using Hirschberg’s recursive algorithm. Then compensate the total length by subtracting weights. This may not give the optimal LCS score but gives approximate solution for faster results. |
This is still a planned optimization to reduce runtime + memory consumption of the editops implementation. I already use it for when calculating the editops for the Levenshtein distance. It is generally worth it for very long sequences (e.g. 100k characters)). In this case it makes sense to use Hirschberg until the individual sequences are smaller than a couple thousand characters and then use the bitparallel algorithm.
Yes even though it would be a bit unfortunate to have suboptimal results. That said I am actually unsure how to implement this correctly in all cases. I am only aware of the implementation in: https://github.com/infoscout/weighted-levenshtein, which:
Are you aware of any implementation that generally works? It would help to have a relatively simple implementation as fallback and for tests and then to specialize whatever possible. |
Hi @maxbachmann, O(N) time complexity implementation of this LCSseq method is very cool. Can you please provide any info about this algo and also how to get the longest-common-subsequence string? Thank you.
The text was updated successfully, but these errors were encountered: