This provides a simple REST API that provides fast searches for anagrams with some subset of the English language. The service maintains a mutable set of words (a corpus) from which the anagrams are determined. Words must first be added to the corpus before they are to be considered by the anagram service. Only valid English words can be added.
- App: JDK
- Tests:
ruby
- Clone this repository and run
./mvnw spring-boot:run
- A few additional tests were added in a separate ruby test file along with the provided tests
find src/test/ruby -type f -name '*test*rb' -exec ruby {} \;
The core optimization for identifying anagrams relies on the fact that anagrams are equal after being
lexicographically sorted. For example, the words "read" and "dear" both sort to "ader". Therefore, we can
separate each word into its respective anagram bucket immediately when it is added to the corpus by
sorting the word to find the bucket key
for constant time lookups.
Anagram results are returned in sorted order. Buckets are SortedSet
s so that sorting only occurs
when a bucket is modified, not for each search API call.
- The application is thread-safe. Access to the data store is only synchronized when creating new buckets
to avoid overwriting a new bucket when simultaneous requests attempt to create a new one for the same
key
-
POST /words.json
: Takes a JSON array of English-language words and adds them to the corpus (data store). -
GET /anagrams/:word.json
:- Returns a JSON array of English-language words that are anagrams of the word passed in the URL.
- Supports an optional query param (
limit
) that indicates the maximum number of results to return.
-
DELETE /words/:word.json
: Deletes a single word from the data store. -
DELETE /words.json
: Deletes all contents of the data store.
- Supports an optional query param (
exclude_proper_nouns
) for whether or not to include proper nouns in the list of anagrams -
POST /anagrams/comparison.json
Endpoint that takes a set of words and returns whether or not they are all anagrams of each other -
DELETE /anagrams/:word.json
Endpoint to delete a word and all of its anagrams
- Generate API documentation
- Version API endpoints
- Add metrics and transaction IDs to request logs with
MDC.Closeable
- Keep the dictionary resource file compressed in the repo to reduce size, then unzip it when loading the app
Some of the unimplemented endpoints would require us to efficiently track buckets by size. This could
probably be done by using an iterable TreeMultiMap
instead of a plain unsorted Map
. The TreeMultiMap
would be sorted by bucket.size
so that the buckets can be easily iterated in order of bucket size.
- Endpoint to return all anagram groups of size >= x
- Endpoint that identifies words with the most anagrams
- Endpoint that returns a count of words in the corpus and min/max/median/average word length
- Average: This can be tracked and updated in constant time by tracking the total corpus size and recomputing
when a word is added or removed. However, this would then require that these operations be
synchronized
to avoid corrupting the values due to race conditions. - Min/Max
- Adding words: These can be tracked in constant time by comparing new words to the current min/max
- Removing words: This can't be easily recomputed with the current data structure if the mix/max word is removed. If this was a high volume API then we could create a second corpus data structure that keeps words ordered by size.
- Average: This can be tracked and updated in constant time by tracking the total corpus size and recomputing
when a word is added or removed. However, this would then require that these operations be
s
= word length, b
= bucket size, n
= corpus size
- Searching for anagrams
- Time:
log(s) + n
for building the key, copying the bucket and removing the input word - Space:
b
for the cloned bucket
- Time:
- Adding words to the corpus
- Time:
log(s) + log(b)
for building the key and adding the word to aSortedSet
bucket - Space:
n
because each word is stored just once
- Time:
- Removing words from the corpus
- Time:
log(s) + log(b)
for building the key and removing the word from theSortedSet
bucket
- Time:
- Clearing the corpus
- Time:
n
for clearing each entry in thekey -> bucket
map. Reinitializing theMap
may be faster by allowing the GC to reclaim everything in one fell swoop.
- Time:
- Kotlin - I implemented this in Kotlin because I read about it when it was first released and wanted to give it a try
- Spring Boot
- Maven
- Ruby (test)