uniqid() is slow

So there I was, profiling one of our systems with XHProf and what jumped out as taking up lots of wall-clock execution time? uniqid(). This function, which provides a convenient way to generate unique identifiers, can hold up your PHP execution far more than it should if you’re not careful.

Two things about uniqid() make it a potential performance minefield:

  1. If not called with a second argument of TRUE (the “more entropy” argument), uniqid() sleeps for at least a microsecond.
  2. uniqid() relies on the system function gettimeofday() — and the Linux kernel essentially serializes access to gettimeofday() across all cores.

Continue reading “uniqid() is slow”

xnperfstat: wrapping NetApp’s PerfStat Tool

Perfstat is a diagnostic data collection tool for NetApp filers. If and when they experience performance issues, NetApp Support will likely ask for perfstat to be run against the ailing filers. This is (hopefully) not something that is done often, and therefore, the details of how to run it may get rusty, which is problematic in the middle of an availability storm.

Continue reading “xnperfstat: wrapping NetApp’s PerfStat Tool”

The Nagios Shell (ngsh)

Another tool we’ve got a fair amount of mileage on is what we internally refer to as the Nagios Shell (ngsh). We have used Nagios since our early days (circa 2005), and it has served us very well to keep an eye on our infrastructure. Over time, we started writing tools to poke and probe Nagios in one way or another. The end result of this process was a hodgepodge of tools that parsed status.dat and did other things they really shouldn’t. We lacked consistence across the toolset, some of them took forever to run (we have a decently large environment), and others failed in mysterious ways.

Continue reading “The Nagios Shell (ngsh)”

Zettabee and Theia

It’s hard to believe it has almost a year since we started the process of open sourcing tools, but it has indeed been that long, and it picked up steam a few weeks ago, when pushed out nddtune, which is admittedly a very simple tool. Today we’re continuing that effort with a couple of more significant tools: Zettabee and Theia.

Continue reading “Zettabee and Theia”

Operations Toolkit

A few months ago (has it been that long already?!) we started the process of pushing some of our internal Operations toolkit out in the open (and you can rightly argue that we barely dipped our toes in the water).

We are picking up where we left off, and are working towards releasing several other tools over the next few weeks, some of them trivial (let’s call them utilities), others far more significant (true tools).

Continue reading “Operations Toolkit”

Tightening Down the Screws

In operations you have to measure many things, but the foundation has to be availability. Availability is your dial tone. If your site isn’t up, then fast and usable and engaging all won’t matter. Even though it’s the most basic metric, we don’t talk about it much because, well, it’s a little mundane. But keep reading, there’s some news in this post.

Quickly defined, availability is the percentage of requests served during some period, successes divided by attempts. While 100% uptime is desirable, the Internet being what it is, most of us set more realistic goals. At Ning our goal is 99.99% uptime, also known as “four nines,” allowing for no more than an hour of downtime per year.

Continue reading “Tightening Down the Screws”