This project aims to develop a basic search engine demonstrating web crawling, indexing, ranking, and query processing using Java.
For running the crawler
make crawler
For running the indexer
make indexer
For running the query test
make query-test
- Collects documents starting with seed URLs.
- Ensures crawling etiquette and multithreading.
- Collects 6000 pages for the project.
- Indexes documents for fast retrieval.
- Maintains index in secondary storage.
- Supports incremental updates.
- Receives and processes user queries.
- Supports stem matching.
- Supports phrase searching with quotation marks.
- Maintains word order in results.
- Ranks documents based on relevance and popularity.
- Considers various relevance calculation methods.
- Uses algorithms like PageRank for popularity.