Final Project Questions

Decision Process

The first question to ask is whether the data is too large to fit on one computer. If so, then Hadoop might be a good idea. If the data will all fit on one computer, then you don't need Hadoop. Just code up a solution without Hadoop.

The second question you need to ask is whether the problems you want to solve are amenable to a MapReduce solution.
Not all questions are. A familiarity with MapReduce design patterns will help you answer this question. As a side note to this, you should always formulate the questions you want answered or the problems you want solved first. Only after doing this, should you decide on a particular technology or technique such as Hadoop/MapReduce.

Search Functionality

There are at least three ways to improve the index:

Don't include HTML tags.
Don't include entries that aren't words. That is, don't include gobbledygook.
Don't include common English words, such as "a", "the", and so on.

I have implemented a solution that has these improvements:

https://github.com/paul-reiners/udacity-intro-hadoop-mapreduce/tree/master/code/index

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FinalProjectQuestions.md

FinalProjectQuestions.md

Final Project Questions

Decision Process

Search Functionality

Other Questions

Response Time

Visualization

Files

FinalProjectQuestions.md

Latest commit

History

FinalProjectQuestions.md

File metadata and controls

Final Project Questions

Decision Process

Search Functionality

Other Questions

Response Time

Visualization