Hadoop Mapper and Reducer Scripts for Python

This repository contains solutions to common mapper and reducer problems in Hadoop using Python. Most online resources for Hadoop are geared towards Java environments, so this repository aims to provide Python solutions for Hadoop streaming.

Hadoop Installation:

Windows:

Watch this video for Hadoop installation on Windows.

Ubuntu:

Follow this video for Hadoop installation on Ubuntu.

Basic Hadoop Commands:

Format Namenode:
```
hdfs namenode -format
```
Start Hadoop Services:
```
start-all.sh
```
Create Input Directory in HDFS:
```
hdfs dfs -mkdir /input
```

Upload Input File to HDFS:

hdfs dfs -put /path/to/input.txt /input/input.txt

Run Hadoop Streaming:

hadoop jar /path/to/hadoop-streaming.jar \
-input /input/input.txt \
-output /output \
-file "/path/to/mapper.py" \
-mapper "python3 mapper.py" \
-file "/path/to/reducer.py" \
-reducer "python3 reducer.py"

Copy Output from HDFS to Local File:

hdfs dfs -text /output/* > /path/to/outputfile.txt

Remove Output and Input Directories from HDFS:

hadoop fs -rm -r /output
hadoop fs -rm -r /input

Testing Mapper and Reducer Scripts:

You can test the mapper and reducer scripts separately to ensure they work correctly:

Test Mapper Script:

cat /path/to/input.txt | python3 /path/to/mapper.py

Test Reducer Script:

cat /path/to/mapper_output.txt | python3 /path/to/reducer.py

Algorithm Explanations:

Recommendation System:

Mapper: Preprocesses user-item ratings.
Reducer: Generates recommendations based on similarity measures between users.

Page Rank:

Mapper: Prepares graph data with nodes and edges.
Reducer: Calculates the PageRank algorithm to determine node importance in the graph.

K-Means:

Mapper: Assigns data points to clusters based on centroid proximity.
Reducer: Updates centroid positions based on cluster assignments.

Weather Data Analysis:

Mapper: Extracts relevant weather data from input records.
Reducer: Aggregates weather data and computes statistics like average temperature or precipitation.

Word Count:

Mapper: Splits text into words and emits key-value pairs for each word.
Reducer: Counts the occurrences of each word.

Sample Input and Output:

You can find sample input and output files in the repository to test the scripts.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files

README.md

README.md

Hadoop Mapper and Reducer Scripts for Python

Hadoop Installation:

Windows:

Ubuntu:

Basic Hadoop Commands:

Testing Mapper and Reducer Scripts:

Algorithm Explanations:

Recommendation System:

Page Rank:

K-Means:

Weather Data Analysis:

Word Count:

Sample Input and Output:

Collapse file tree

Files

README.md

Latest commit

History

README.md

File metadata and controls

Hadoop Mapper and Reducer Scripts for Python

Hadoop Installation:

Windows:

Ubuntu:

Basic Hadoop Commands:

Testing Mapper and Reducer Scripts:

Algorithm Explanations:

Recommendation System:

Page Rank:

K-Means:

Weather Data Analysis:

Word Count:

Sample Input and Output: