- Crawling and parsing HTML documents and web pages.
- Working with
nltk
andbeautifulsoup
- Assessing information retrieval quality using Mean Average Precision (MAP), 11-point interpolated average, Normalized Discounted Cumulative Gain (NDCG), and pFound.
- Building a simple search engine using the Cranfield dataset.
- Building a geospatial index for dataset with points of interest
- Working with Geocoding APIs and caching results.
- Nearest Neighbors search with sklearn BallTree and Annoy index.
- Using pygtrie on AOL user session dataset to generate search suggestions.
- Evaluating suggestion performance.
- Adding spellcheck to suggestion algorithm.