Sweeper, an HDFS browser

Posted by pierre on July 15, 2010 – 11:02 AM

When monitoring a Hadoop cluster, two numbers come in mind: disk usage and how many files are tracked by the namenode. In combination with traditional monitoring tools (Nagios, Zenoss, …), we also use a small in-house utility, Sweeper, to quickly find large directories to clean up or small files to combine.

Sweeper has just been open-sourced!

The screenshots below show the two operating modes: it either displays the size of each file or directory, taking into account the replication factor (default), or the number of files in each directory.

Sweeper in disk usage mode

Sweeper in disk usage mode

Sweeper in file counts mode

Sweeper in file counts mode

You can get it on github: http://github.com/pierre/sweeper (direct link for the 1.0.0 release). Enjoy 😉

pierre Posted by pierre, written on July 15, 2010 – 11:02 AM.

Also from Ning Code…


Post a Comment

Comment moderation is enabled. Your comment will not appear until it has been approved. We look forward to hearing from you and encourage your comments, critiques, questions, and suggestions. We ask that you stay on topic, respect other people's opinions, and avoid profanity and offensive statements. For questions specific to your Ning Network, please drop us a note in the Ning Help Center.

Attend Tech Talks by Ning's Engineering & Ops teams at Ning HQ in downtown Palo Alto, CA!

Archives by Category

Search this Blog