Operations Toolkit

A few months ago (has it been that long already?!) we started the process of pushing some of our internal Operations toolkit out in the open (and you can rightly argue that we barely dipped our toes in the water).

We are picking up where we left off, and are working towards releasing several other tools over the next few weeks, some of them trivial (let’s call them utilities), others far more significant (true tools).

We are starting today on the utility end of the spectrum with nddtune, a SMF manifest/method combo you can use to ensure ndd tweaks that are not configurable via /etc/system (in Solaris) stay tweaked when a system reboots. See the README file for details. This is based on Dr. Hung-Sheng Tsao’s SMF and tcp tuning 2008 blog post, primarily adding a configuration file.

I chose a trivial utility to start with because the aim of this post is really to provide a preview of some of the tools we are working on publishing. Part of the work involves fixing some of the known bugs we have identified, as well as removing or improving code that is not generic and assumes the tool is running in Ning’s environment. Specific blog posts will follow as we make releases.

A sample of the upcoming tools, and in no particular order:

  • The Nagios Shell sits halfway in between utility and tool. We have traditionally not used any of the Nagios graphical interfaces but still need a way to interact with Nagios in a sane fashion. After a couple of years of writing a hodge-podge of utilities, we decided to collapse the functionality into a single tool, which uses Matthias Kettner’s incredible MK-Livestatus module to provide far more of the functionality that we had before. The current incantation of the Nagios Shell is actually a shell script coupled with a Python script, and we’re in the middle of a from-the-ground-up rewrite in Ruby to add further funcionality while removing complexity.
  •  Zettabee is definitely in tool territory. We use ZFS storage for a variety of purposes, and the bulk of the data we store on ZFS has to be replicated to other facilities. Zettabee encapsulates and manages zfs send and receive functionality to provide incremental, block-level, asynchronous replication of remote ZFS file systems (no support for synchronous operations is available), and it’s tightly (but optionally) integrated with Nagios as well.
  • Theia provides NetApp filer performance monitoring and alerting through its integration with Nagios and Zenoss. One of the oldest tools in our toolkit, it has been in production for well over four years, constantly prodding and poking our filers to extract performance data and alert as necessary. It used to be that monitoring NetApp filers was a) painful and/or b) expensive (DFM anyone?). Theia takes care of that for us.

There are other tools in the pipeline, but these will have to do for now. It is our sincere hope that they are useful outside of our environment, and hopefully, other bright coders out there can crank out additional fixes and functionality that we have not implemented. And don’t forget to check out the rest of Ning’s open source projects on GitHub!

2 thoughts on “Operations Toolkit”

Leave a Reply

Your email address will not be published. Required fields are marked *