This project is structured to work as a series of classes focused on bootstrapping your data-mining / web-crawling knowledge. Some of the topics that are covered here:
- Anatomy of a Crawler (Policies and Behaviors)
- Understanding HTTP Requests
- Scrapping / Parsing data out of HTML pages
- Tooling (Frameworks and custom-made libraries)
- Finding your public source of data
- Modeling your objects
- Storing your results
- Scaling up your crawler
Ideally, at the end of this "Course" you should be able to write your own Crawler / Scrapper in C#.
Keep this project Wiki open at all times, since most of the text / references will be there for you to read, while you advance through the chapters/classes of this project.
Start each chapter by going to the Wiki first, and only after reading it's text, proceed to the code.
Take your time, read the code comments, run it, modify it and run it again to understand the impact of each change.
My name is Marcello Lins, I am a 24 y/o developer from Brazil who works with BigData and DataMining related products at the moment.
0.0.5