HdfsUtils

Project to help analysing HDFS metadata.

First feature is a D3 Sunburst visualization showing HDFS space usage and/or number of files
Snapshot space consumption overhead analyzer (from this discussion is coming next (stay tunned).

##Options to run ###1- Zeppelin notebook Just import URL below in your zeppelin instance and runs step-by-step:
https://raw.githubusercontent.com/gbraccialli/HdfsUtils/master/zeppelin/hdfs-d3.json

###Live Preview here

###2- Build from source, running in command line and using html file ###Building

git clone https://github.com/gbraccialli/HdfsUtils.git
cd HdfsUtils
mvn clean package

###Basic usage

java -jar target/gbraccialli-hdfs-utils-with-dependencies.jar \
  --path=/ \
  --maxLevelThreshold=-1  \
  --minSizeThreshold=-1  \
  --showFiles=false   \
  --verbose=true > out.json

###Visualizing Open html/hdfs_sunburst.html in your browser and point to .json file you created in previous step, or copy/paste json content on right load options

PS: note Chrome browser has security contraint that does not allow you to load local files, use one of the following options:

Use zeppelin notebook (describe above)
Use Safari
Enable Chrome local files access: instructions here
Publish json in a webserver and use full URL

###Command line options: ####--confDir=
//path-to-conf-dir //specify directory containing hadoop config files, default to /etc/hadoop/conf

####--maxLevelThreshold=
-1 or or valid int //max number of directories do drill down. -1 means no limit. for example: maxLevelThreshold=3 means drill down will stop after 3 levels of subdirectories

####--minSizeThreshold=
//-1 or valid long //min number of bytes in a directory to continue drill down. -1 means no limit. minSizeThreshold=1000000 means only directories greater > 1000000 bytes will be drilled down

####--showFiles=
//true or false //whether to show information about files. showFiles=false will show summary information about files in each directory/subdirectory.

####--exclude=
//path1,path2,... //directories to exclude from drill down, for example: /tmp/,/user/ won't present information about those directories.

####--doAs=
//username (hdfs for example) //for non-kerberized cluster, you can set user to perform hdfs operations, using hdfs you won't have permissions issues. if you are using a kerberized cluster, grant read access to user performing this operation (you can use Ranger for this)

####--verbose=
//true or false //when true print processing info into System.err (not applied for zeppelin)

####--path=
//path to start analysis

##Special thanks to:

Dave Patton who first created HDP-Viz where I got insipered and copied lots of code
Ali Bajwa who created ambari stack for Dave's project (and helped me get it working)
David Streever who created (or forked) hdfs-cli, where I also copied lots of code

Name		Name	Last commit message	Last commit date
Latest commit History 39 Commits
html		html
scripts		scripts
src/main/java/com/github/gbraccialli/hdfs		src/main/java/com/github/gbraccialli/hdfs
target		target
zeppelin		zeppelin
.classpath		.classpath
.project		.project
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HdfsUtils

About

Releases

Packages

Languages

gbraccialli/HdfsUtils

Folders and files

Latest commit

History

Repository files navigation

HdfsUtils

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages