Web Crawling 101 - On Going Project

This project is structured to work as a series of classes focused on bootstrapping your data-mining / web-crawling knowledge. Some of the topics that are covered here:

Anatomy of a Crawler (Policies and Behaviors)
Understanding HTTP Requests
Scrapping / Parsing data out of HTML pages
Tooling (Frameworks and custom-made libraries)
Finding your public source of data
Modeling your objects
Storing your results
Scaling up your crawler

Ideally, at the end of this "Course" you should be able to write your own Crawler / Scrapper in C#.

How do I Start ?

Keep this project Wiki open at all times, since most of the text / references will be there for you to read, while you advance through the chapters/classes of this project.

Start each chapter by going to the Wiki first, and only after reading it's text, proceed to the code.

Take your time, read the code comments, run it, modify it and run it again to understand the impact of each change.

About me

My name is Marcello Lins, I am a 24 y/o developer from Brazil who works with BigData and DataMining related products at the moment.

http://about.me/marcellolins

Version

0.0.5

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
WebCrawling101		WebCrawling101
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Crawling 101 - On Going Project

How do I Start ?

About me

Version

About

Releases

Packages

Languages

License

MarcelloLins/WebCrawling101-DEPRECATED

Folders and files

Latest commit

History

Repository files navigation

Web Crawling 101 - On Going Project

How do I Start ?

About me

Version

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages