Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow iteration over keys with #each (and include Enumerable) #19

Open
roryokane opened this issue Apr 5, 2015 · 4 comments
Open

Allow iteration over keys with #each (and include Enumerable) #19

roryokane opened this issue Apr 5, 2015 · 4 comments

Comments

@roryokane
Copy link

I would like to be able to iterate over all keys in a Trie, or all keys that start with a certain prefix (keys that are under a certain TrieNode.) Right now, there is no way to tell what keys are in the trie without trying to access every possible letter from every possible trie node.

You could support this by implementing #each, which should enumerate all objects in the trie, in order. Implementing #each will also allow you to include Enumerable, which provides many convenient methods.

Without this functionality, I cannot use this library to implement the word game GHOST. With my current array implementation, I print a random sample of words that start with a certain prefix:

starting_word_offset = Kernel::rand(wordlist.length - NUM_OF_WORDS_TO_SHOW)
wordlist_subset = wordlist[starting_word_offset, NUM_OF_WORDS_TO_SHOW]

And I also use Enumerable methods to filter the words in the trie:

return wordlist.reject { |word| word.length < MINIMUM_WORD_LENGTH }

If providing this functionality would make Trie operations unacceptably slower, you can provide two trie classes with your library: a SimpleTrie and a Trie, where a Trie provides more methods, but is slightly slower.

@roryokane roryokane changed the title Allow iteration over keys with #each (and include Enumerable) Allow iteration over keys with #each (and include Enumerable) Apr 5, 2015
@tyler
Copy link
Owner

tyler commented Apr 5, 2015

The inability to enumerate all objects in the trie is a limitation of the underlying data structure implementation and is an intentional trade-off. This is using a dual-array trie, which has the advantage of being highly compact. The downside is that the only way to tell whether or not a particular branch exists is by attempting to traverse it.

Without completely changing the underlying implementation there is no way around this. There are other trie implementations that do not have that limitation which would probably be a better fit for your use case.

@tyler tyler closed this as completed Apr 5, 2015
roryokane added a commit to roryokane/trie that referenced this issue Apr 6, 2015
By the time I found out that this implementation does not provide `each`, I had already put this library in my `Gemfile` and started rewriting some methods to use this library. This disclaimer should prevent others from spending their time integrating with a trie implementation that does not fit their needs.

This disclaimer is based on Tyler’s comment in issue tyler#19.
@hickford
Copy link
Collaborator

hickford commented Apr 6, 2015

Newer versions of the upstream C library libdatrie reportedly support breadth-first traversal (and some other kind?), so progress might be possible

See https://github.com/kmike/datrie/blob/master/libdatrie/ChangeLog#L376 and https://github.com/kmike/datrie/blob/master/libdatrie/ChangeLog#L342

@tyler
Copy link
Owner

tyler commented Apr 6, 2015

They added the TrieIterator stuff a while ago, but it's implemented in the way I mentioned above. (See: https://github.com/kmike/datrie/blob/9e794aac1f68ebf920bd46e977825f374a64dc47/libdatrie/datrie/darray.c#L825-L830) I'd actually forgotten that I'd implemented the same thing really early on in order to support the Trie#children method. (See: https://github.com/tyler/trie/blob/master/ext/trie/trie.c#L227-L228) I was hesitant to add that because of the performance implications... I prefer not to add things which are egregiously inefficient. Unless you're familiar with the code it can be very easy to shoot yourself in the foot. (The difference between Array#push and Array#shift in Ruby is my favorite example of this.)

But given that it already exists, is implemented in the upstream library, and would apparently be useful to people... Maybe I should reconsider.

@tyler tyler reopened this Apr 6, 2015
@jensljungblad
Copy link

@tyler What other trie implementations would you suggest if I need this functionality?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants