This program was written for DeAnza's CS301 Data Structures class in 2019 as a final project. Constraints required utilization of an in-house implementation of template data structures used in the development of the application. The application is a CLI interface for interacting with a database of movies using the database's search engine.
- Singly Linked List
- Stack
- HashTable (Quadratic Probing)
- Binary Search Tree
- load each row denoting a movie entry and its attributes
- create the movie entry and add into the list collection
- Use movie ID attribute as the key
- Preprocess the ID cleaning out any unwanted characters
- Add the movie into the database using the preprocessed key and the movie as the value
- Using the list of movies, create a map of bsts.
- NOTE Index of bst was maintained lexically.
- BST entries containing keywords associated with a movie such that that the first character of each tokenized keyword pertaining to a movie signifies a key in the dictionary
- NOTE: tokenized words are the movie attributes such as the movie name, genre, year released, runtime, and id
- more attributes could have included the cast as well as production -- at the bare minimum there are atleast 5 attributes resulting in 5 keywords
- this was the largest computational cost when spinning up the application since it uses a nested loop resulting in an average O(n^2) insertion time
- NOTE: tokenized words are the movie attributes such as the movie name, genre, year released, runtime, and id
Process | Time |
---|---|
List of Movies Generation Time | 0.069353s |
Movie Load From List to Table Time | 0.0349641s |
Tabled BST Search Engine Time | 2.73182s |
Metrics | Values |
---|---|
Occupancy | 507 |
Capacity | 1277 |
Load Factor | 39.7% |
Number of Collisions | 166 |
Metrics | Values |
---|---|
Occupancy | 41 |
Capacity | 521 |
Load Factor | 7.869% |
Number of Collisions | 0 |
Reading movies based on relevance of a user's keyword search
- Preprocess the keyword(s) search entry into tokenized list
- Loop LV 1: For each starting character of a tokenized user entry get the associated BST
at O(1) and search for the associated keyword BST node O(logn)
- NOTE user keyword searches are usually short, this would have many issues if there were too many keywords
- Loop LV 2: For each keyword-node containing a list of movies, create a tally value of 1 for each movie associated keyword MovieID incrementing the tally for each existing movieID O(n^2)
- Sort movies based upon their tallied keyword relevance ie
"how many keyword hits did each movie have?"
and return list of movies sorted by relevance measured by the set of tokenized keywords of a user search entry
- Context Index Searching: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.9686&rep=rep1&type=pdf
- IMDB Dataset: https://www.imdb.com/interfaces/