Context Indexing Movie Search Engine

This program was written for DeAnza's CS301 Data Structures class in 2019 as a final project. Constraints required utilization of an in-house implementation of template data structures used in the development of the application. The application is a CLI interface for interacting with a database of movies using the database's search engine.

Data Structures Used

Singly Linked List
Stack
HashTable (Quadratic Probing)
Binary Search Tree

How It Works

Create

(1) Build the list of movies

load each row denoting a movie entry and its attributes
create the movie entry and add into the list collection

(2) Create the hashtabe of movies

Use movie ID attribute as the key
Preprocess the ID cleaning out any unwanted characters
Add the movie into the database using the preprocessed key and the movie as the value

(3) Create the search engine

Using the list of movies, create a map of bsts.
- NOTE Index of bst was maintained lexically.
BST entries containing keywords associated with a movie such that that the first character of each tokenized keyword pertaining to a movie signifies a key in the dictionary
- NOTE: tokenized words are the movie attributes such as the movie name, genre, year released, runtime, and id
  - more attributes could have included the cast as well as production -- at the bare minimum there are atleast 5 attributes resulting in 5 keywords
  - this was the largest computational cost when spinning up the application since it uses a nested loop resulting in an average O(n^2) insertion time

Start Up Statistics (Small Dataset)

Process	Time
List of Movies Generation Time	0.069353s
Movie Load From List to Table Time	0.0349641s
Tabled BST Search Engine Time	2.73182s

Movie Table Statistics

Metrics	Values
Occupancy	507
Capacity	1277
Load Factor	39.7%
Number of Collisions	166

Tabled BST Search Engine Statistics

Metrics	Values
Occupancy	41
Capacity	521
Load Factor	7.869%
Number of Collisions	0

Read

Reading movies based on relevance of a user's keyword search

Preprocess the keyword(s) search entry into tokenized list
Loop LV 1: For each starting character of a tokenized user entry get the associated BST at O(1) and search for the associated keyword BST node O(logn)
- NOTE user keyword searches are usually short, this would have many issues if there were too many keywords
Loop LV 2: For each keyword-node containing a list of movies, create a tally value of 1 for each movie associated keyword MovieID incrementing the tally for each existing movieID O(n^2)
Sort movies based upon their tallied keyword relevance ie "how many keyword hits did each movie have?" and return list of movies sorted by relevance measured by the set of tokenized keywords of a user search entry

References:

Context Index Searching: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.309.9686&rep=rep1&type=pdf
IMDB Dataset: https://www.imdb.com/interfaces/

Name		Name	Last commit message	Last commit date
Latest commit History 210 Commits
.vs		.vs
cis22c_f19_project		cis22c_f19_project
docs		docs
images		images
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Indexing Movie Search Engine

Data Structures Used

How It Works

Create

(1) Build the list of movies

(2) Create the hashtabe of movies

(3) Create the search engine

Start Up Statistics (Small Dataset)

Movie Table Statistics

Tabled BST Search Engine Statistics

Read

References:

About

Releases

Packages

Contributors 3

Languages

henrylao/myMDB

Folders and files

Latest commit

History

Repository files navigation

Context Indexing Movie Search Engine

Data Structures Used

How It Works

Create

(1) Build the list of movies

(2) Create the hashtabe of movies

(3) Create the search engine

Start Up Statistics (Small Dataset)

Movie Table Statistics

Tabled BST Search Engine Statistics

Read

References:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages