You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Maybe I missed something, but I do not understand what the advantage of using the tie dictionary is over using a hashing dictionary is. Can you tell me what I am missing?
Let K denote the number of keywords in the corpus and Q the number of query words for which you request the check for.
Regarding build the respective data structures, flashtext should probably take something like Q(K * log(K). The first K is because you need to insert K words for which you need to query the tries data structure. However, since the trie is small at the beginning I put the second K into log. In opposite, the hash table only requires O(1), so we get O(K) in total for inserting the K words.
Assuming I am right: A way to make flashtext useful could be if you build two tries: One for the query words and one for the dictionary. Building the tries would take O(Q * log(Q) + K * log(K)), which would be slower than O(K) for the hash table. But querying would take about O(log(K) * log(Q)), which would be better the O(Q) if Q is large enough and and K is small enough.
Another application for the principles might be for finding longest common substrings.
These two ideas for improvement would of course need a bit more implementation.
The text was updated successfully, but these errors were encountered:
Maybe I missed something, but I do not understand what the advantage of using the tie dictionary is over using a hashing dictionary is. Can you tell me what I am missing?
Let K denote the number of keywords in the corpus and Q the number of query words for which you request the check for.
I am guessing that https://www.freecodecamp.org/news/regex-was-taking-5-days-flashtext-does-it-in-15-minutes-55f04411025f/ reflects the time per query. So we have query time complexity of O(Q * K). I actually think this is weird - instinctually I would have expected O(Q * log(K)). Anyhow, just building a hash table would take O(Q).
Regarding build the respective data structures, flashtext should probably take something like Q(K * log(K). The first K is because you need to insert K words for which you need to query the tries data structure. However, since the trie is small at the beginning I put the second K into log. In opposite, the hash table only requires O(1), so we get O(K) in total for inserting the K words.
Assuming I am right: A way to make flashtext useful could be if you build two tries: One for the query words and one for the dictionary. Building the tries would take O(Q * log(Q) + K * log(K)), which would be slower than O(K) for the hash table. But querying would take about O(log(K) * log(Q)), which would be better the O(Q) if Q is large enough and and K is small enough.
Another application for the principles might be for finding longest common substrings.
These two ideas for improvement would of course need a bit more implementation.
The text was updated successfully, but these errors were encountered: