Anagram API

This provides a simple REST API that provides fast searches for anagrams with some subset of the English language. The service maintains a mutable set of words (a corpus) from which the anagrams are determined. Words must first be added to the corpus before they are to be considered by the anagram service. Only valid English words can be added.

Prerequisites

App: JDK
Tests: ruby

Getting Started

Running the app

Clone this repository and run ./mvnw spring-boot:run

Running the tests

A few additional tests were added in a separate ruby test file along with the provided tests
find src/test/ruby -type f -name '*test*rb' -exec ruby {} \;

Backend Design

The core optimization for identifying anagrams relies on the fact that anagrams are equal after being lexicographically sorted. For example, the words "read" and "dear" both sort to "ader". Therefore, we can separate each word into its respective anagram bucket immediately when it is added to the corpus by sorting the word to find the bucket key for constant time lookups.

Anagram results are returned in sorted order. Buckets are SortedSets so that sorting only occurs when a bucket is modified, not for each search API call.

Thread Safety

The application is thread-safe. Access to the data store is only synchronized when creating new buckets to avoid overwriting a new bucket when simultaneous requests attempt to create a new one for the same key

API

Core

POST /words.json: Takes a JSON array of English-language words and adds them to the corpus (data store).
GET /anagrams/:word.json:
- Returns a JSON array of English-language words that are anagrams of the word passed in the URL.
- Supports an optional query param (limit) that indicates the maximum number of results to return.
DELETE /words/:word.json: Deletes a single word from the data store.
DELETE /words.json: Deletes all contents of the data store.

Optional

Supports an optional query param (exclude_proper_nouns) for whether or not to include proper nouns in the list of anagrams
POST /anagrams/comparison.json Endpoint that takes a set of words and returns whether or not they are all anagrams of each other
DELETE /anagrams/:word.json Endpoint to delete a word and all of its anagrams

TODO

Generate API documentation
Version API endpoints
Add metrics and transaction IDs to request logs with MDC.Closeable
Keep the dictionary resource file compressed in the repo to reduce size, then unzip it when loading the app

Some of the unimplemented endpoints would require us to efficiently track buckets by size. This could probably be done by using an iterable TreeMultiMap instead of a plain unsorted Map. The TreeMultiMap would be sorted by bucket.size so that the buckets can be easily iterated in order of bucket size.

Endpoint to return all anagram groups of size >= x
Endpoint that identifies words with the most anagrams
Endpoint that returns a count of words in the corpus and min/max/median/average word length
- Average: This can be tracked and updated in constant time by tracking the total corpus size and recomputing when a word is added or removed. However, this would then require that these operations be synchronized to avoid corrupting the values due to race conditions.
- Min/Max
  - Adding words: These can be tracked in constant time by comparing new words to the current min/max
  - Removing words: This can't be easily recomputed with the current data structure if the mix/max word is removed. If this was a high volume API then we could create a second corpus data structure that keeps words ordered by size.

Time and Memory Complexity

s = word length, b = bucket size, n = corpus size

Searching for anagrams
- Time: log(s) + n for building the key, copying the bucket and removing the input word
- Space: b for the cloned bucket
Adding words to the corpus
- Time: log(s) + log(b) for building the key and adding the word to a SortedSet bucket
- Space: n because each word is stored just once
Removing words from the corpus
- Time: log(s) + log(b) for building the key and removing the word from the SortedSet bucket
Clearing the corpus
- Time: n for clearing each entry in the key -> bucket map. Reinitializing the Map may be faster by allowing the GC to reclaim everything in one fell swoop.

Stack

Kotlin - I implemented this in Kotlin because I read about it when it was first released and wanted to give it a try
Spring Boot
Maven
Ruby (test)

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.mvn/wrapper		.mvn/wrapper
src		src
.gitignore		.gitignore
README.md		README.md
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Anagram API

Prerequisites

Getting Started

Running the app

Running the tests

Backend Design

Thread Safety

API

Core

Optional

TODO

Time and Memory Complexity

Stack

About

Releases

Packages

Languages

gavingolden/anagram-api

Folders and files

Latest commit

History

Repository files navigation

Anagram API

Prerequisites

Getting Started

Running the app

Running the tests

Backend Design

Thread Safety

API

Core

Optional

TODO

Time and Memory Complexity

Stack

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages