zlatanitor

Parsing YARN application logs

Motivation

In YARN logs generated by applications can be are moved to a central location in HDFS. This gives an easy way to access these logs for debugging purposes or for historical analyses to discover performance issues.

Everything sounds good, except the fact that applications logs are stored in a binary format called TFile that is very inconvenient to parse. Moreover, application logs are frequent and small, and this makes them hard to directly process using MapReduce jobs.

Here is a conceptually simple MapReduce jobs to parse them. It simply counts how many times a given line occur in the standard errors of map and reduce tasks, hoping that this will give some insights why jobs fail and what popular bugs are.

One can use applications logs to discover some performance issues e.g. to count how many times map tasks spill the in-memory map output buffer to disk. If it is done more than once, you can consider giving more memory to it, to minimize disk IO.

$ mvn -P full package
$ hadoop jar target/zlatanitor-1.0-SNAPSHOT-jar-with-dependencies.jar com.hakunamapdata.zlatanitor.job.yarn.mapreduce.ApplicationLogLineCount /app-logs/kawaa/logs/*/* logs/kawaa/linecount
$ hadoop fs -cat hadoop fs -cat logs/kawaa/linecount/*

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src		src
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zlatanitor

Parsing YARN application logs

Motivation

About

Releases

Packages

Languages

kawaa/zlatanitor

Folders and files

Latest commit

History

Repository files navigation

zlatanitor

Parsing YARN application logs

Motivation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages