uniqid() is slow

Posted by David Sklar on February 1, 2012 – 6:33 AM

So there I was, profiling one of our systems with XHProf and what jumped out as taking up lots of wall-clock execution time? uniqid(). This function, which provides a convenient way to generate unique identifiers, can hold up your PHP execution far more than it should if you’re not careful.

Two things about uniqid() make it a potential performance minefield:

  1. If not called with a second argument of TRUE (the “more entropy” argument), uniqid() sleeps for at least a microsecond.
  2. uniqid() relies on the system function gettimeofday() — and the Linux kernel essentially serializes access to gettimeofday() across all cores.

 

The Big Sleep

There are two funky things about sleeping from within uniqid(). First, why does PHP need to sleep at all? And how long does that sleep really take?

Without the “more entropy” flag set, uniqid() constructs the unique identifier entirely from the time returned from gettimeofday(): 8 hex digits from the seconds part of the time and 5 hex digits from the microseconds part of the time. So if two calls to uniqid() happen in the same microsecond, then each would return the same identifier. Not so unique!

To avoid this problem, uniqid() ensures that two calls from the same process can’t happen in the same microsecond by making sure uniqid() takes at least a microsecond to execute. The implementation of uniqid() calls usleep(1) to sleep for a microsecond, so the entire uniqid() execution will take at least that long.

In practice, though, that innocent looking usleep(1) can cause a delay of a lot more than just 1 microsecond. The man page for usleep() says “The usleep() function suspends execution of the calling thread for (at least) usec microseconds.”. That “at least” is no joke. On my test system, usleep(1) takes about 63 microseconds to execute. Depending on the resolution of the hardware clocks on your system, your results will vary.

Setting the “more entropy” flag changes this behavior. PHP skips the usleep(1) call and instead disambiguates potential same-microsecond collisions by appending data from php_combined_lcg() to the identifier. This function returns pseudorandom data; two successive calls to it from the same process will return different results.

gettimeofday() serialization

On Linux, a call to gettimeofday() from userspace ultimately ends up running the kernel code at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=kernel/time/timekeeping.c#l217 :

do {
  seq = read_seqbegin(&xtime_lock);
  *ts = xtime;
  nsecs = timekeeping_get_ns();

  /* If arch requires, add in gettimeoffset() */
  nsecs += arch_gettimeoffset();
} while (read_seqretry(&xtime_lock, seq));

Those read_seqbegin() and read_seqretry() calls that wrap the call to timekeeping_get_ns() are read locks that force the caller (ultimately PHP calling gettimeofday()) to retry if a write is in progress. And since the time is updated by the kernel so frequently, writes are often in progress.

What this means in practice is that there is a relatively fixed upper bound on the number of gettimeofday() calls that can happen per second across the entire system. If you just have one processor/core, this is not such a big deal, since your single processor is spending its time doing plenty of other things besides calling gettimeofday(). But if you have multiple cores (the system I was profiling was running 8 cores) then the total number of calls across all cores is that same upper bound. So each core has to queue up to get its gettimeofday() calls done.

I wrote a simple program (gettimeofday-forker) which lets you measure how fast gettimeofday() calls are on your system. Download the source code and then compile it with gcc -o gettimeofday-forker gettimeofday-forker.c. Then, initially run it with just one child and an interval of 1000000 microseconds (1 second). On my Linux system, I get this output:

$ ./gettimeofday-forker 1 1000000
[-] All children running
[0] 638146 calls in 1000000 usec = 0.64 calls/usec

With 2, 4, and 8 children, I get:

$ ./gettimeofday-forker 2 1000000
[-] All children running
[1] 366003 calls in 1000000 usec = 0.37 calls/usec
[0] 365881 calls in 1000000 usec = 0.37 calls/usec
$ ./gettimeofday-forker 4 1000000
[-] All children running
[3] 183819 calls in 1000004 usec = 0.18 calls/usec
[2] 181205 calls in 1000001 usec = 0.18 calls/usec
[1] 183655 calls in 1000000 usec = 0.18 calls/usec
[0] 183866 calls in 1000001 usec = 0.18 calls/usec
$ ./gettimeofday-forker 8 1000000
[-] All children running
[7] 91727 calls in 1000002 usec = 0.09 calls/usec
[2] 91928 calls in 1000018 usec = 0.09 calls/usec
[1] 91727 calls in 1000001 usec = 0.09 calls/usec
[6] 90878 calls in 1000441 usec = 0.09 calls/usec
[0] 91600 calls in 1000029 usec = 0.09 calls/usec
[3] 91917 calls in 1000660 usec = 0.09 calls/usec
[4] 94734 calls in 1000001 usec = 0.09 calls/usec
[5] 95827 calls in 1000001 usec = 0.10 calls/usec

The total number of gettimeofday() calls remains about constant, even as the number of child processes increases.

If you experiment with gettimeofday-forker on your own, keep in mind that you should specify an interval large enough to make sure that the time to start up all the processes is very small compared to total runtime. If some children finish before other start, your measurements will be skewed.

Solution

Our dear friend uniqid() has some performance problems. What’s the solution? In our case, we just switched to using mt_rand(). The places we were using uniqid() didn’t really need global-for-all-time unique identifiers, just identifiers that were extremely unlikely to collide with other identifiers being generated around the same time.

If you need stronger guarantees of uniqueness, you have a few options. You could build your own IDs by combining things that are unlikely to collide and cheap to obtain. For example, combine IP address of server, process ID, time at start of request, and a per-request sequence number. (If you’re running PHP in a multithreaded environment, also include thread ID.) For example:

function my_uniqid() {
  static $counter = 0;
  static $pid = -1;
  static $addr = -1;

  if ($pid == -1) { $pid = getmypid(); }
  if ($addr == -1) { $addr = ip2long($_SERVER['SERVER_ADDR']); }

  return $addr . $pid . $_SERVER['REQUEST_TIME'] . ++$counter;
}

This gives you a nice string of digits that are each guaranteed to be generated only once, unless you reset your system clock or duplicate IP addresses.

If that’s not enough, take a look at the PECL uuid extension ), which wraps libuuid. It can generate UUIDs based on things such as Ethernet MAC address. It can be told to generate time-based UUIDs, though, which in turn use gettimeofday() so be careful to avoid that if you don’t want to run into the kinds of problems outlined above.

Experiment with Literate Programming Didn’t Work Out

Posted by Jonathan Aquino on January 30, 2012 – 2:41 PM

About a year and a half ago, I tried doing some “literate programming” for one of our internal scripts. Well, I have just converted those scripts back to the normal, un-literate (illiterate?) style. It just wasn’t working out.

What Is Literate Programming?

The basic idea behind literate programming is that you write a free-flowing narrative, and insert chunks of code in it, like we’re doing now:

<<The most common programming example.>>=
echo 'Hello, World!';
@

As you can see, the chunk of code begins with <<label>>= and ends with @. A cool thing is that chunks can refer to other chunks, like this:

<<A simple PHP loop>>=
<?php
for ($i = 0; $i < 5; $i++) {
    <<The most common programming example.>>
}
@

As a matter of fact, this blog post is a literate program. You can run it using noweb.py -R”A simple PHP loop” input.txt and it will generate the PHP, which you can then run. (It will print Hello, World! five times.)

So What Happened?

I had a beautiful literate program to process the files received from our translators. It was fun to write – I explained what I was doing as I went along, and was able to “think out loud”, which helped with the more complicated bits. You could see the reasoning behind my choices by reading through the nearby narrative.

But the problems started happening after I was done. I was the only one maintaining the script, probably because no-one was interested in learning how to generate the PHP from it. Even for myself, it was always a bit of a chore to re-learn the proper incantation to regenerate the PHP whenever I needed to go back to fix something. And probably the worst thing about it was that when an error occurs, the line number reported by the error is for the generated file, not for the original literate file. So I would need to insert my debugging print statements in the generated file, then when I figured out the problem, make the fix in the literate file, regenerate the script, run the script, etc. A bit of a painful workflow.

How Do You “Undo” a Literate Program?

I got tired of the line-numbering mismatches and the workflow, so I made the literate program non-literate. I started with the generated script, and added back all of the narrative as normal (//) comments. It actually worked out pretty well – by rearranging the code a little bit, I was able to maintain the order of the original narrative parts. Here’s an example of a before/after:

Before:

indexOfLastMatch is simple but important. It tells us the index of the last line
matching the given regex pattern. Let's write it.

<<Finding the index of the last match>>=
private function indexOfLastMatch($lines, $regex) {
    $indexOfLastMatch = -1;
    for ($i = 0; $i < count($lines); $i++) {
        if (preg_match($regex, $lines[$i])) {
            $indexOfLastMatch = $i;
        }
    }
    return $indexOfLastMatch;
}
@

After:

// indexOfLastMatch is simple but important. It tells us the index of the last line
// matching the given regex pattern. Let's write it.

// [Finding the index of the last match]
private function indexOfLastMatch($lines, $regex) {
    $indexOfLastMatch = -1;
    for ($i = 0; $i < count($lines); $i++) {
        if (preg_match($regex, $lines[$i])) {
            $indexOfLastMatch = $i;
        }
    }
    return $indexOfLastMatch;
}

The result is definitely a different approach than JavaDoc – more of a story, a carpenter explaining his work to his hearers. I’m a fan of the JavaDoc approach (cold, systematic descriptions of each class, method, and parameter), and will continue to use that. But literate programming was a pleasant excursion into a different way of doing things. Like all vacations, though, one must eventually come back home.

Have you experimented with literate programming or ever been tempted to?

xnperfstat: wrapping NetApp’s PerfStat Tool

Posted by gerir on November 3, 2011 – 1:46 AM

Perfstat is a diagnostic data collection tool for NetApp filers. If and when they experience performance issues, NetApp Support will likely ask for perfstat to be run against the ailing filers. This is (hopefully) not something that is done often, and therefore, the details of how to run it may get rusty, which is problematic in the middle of an availability storm.

We wrote xnperfstat in late 2008 to provide a cleaner and more straightforward method to run perfstat: it performs perfstat housekeeping chores for us, with options geared directly towards situations where support cases are open. It can also be run on a “continuous” basis from cron (for cases where the data collection has to be done over a period of days), storing and rotating output files.

See the README for details.

The Nagios Shell (ngsh)

Posted by gerir on October 26, 2011 – 1:24 AM

Another tool we’ve got a fair amount of mileage on is what we internally refer to as the Nagios Shell (ngsh). We have used Nagios since our early days (circa 2005), and it has served us very well to keep an eye on our infrastructure. Over time, we started writing tools to poke and probe Nagios in one way or another. The end result of this process was a hodgepodge of tools that parsed status.dat and did other things they really shouldn’t. We lacked consistence across the toolset, some of them took forever to run (we have a decently large environment), and others failed in mysterious ways.

About a year and a half ago we decided to stop the madness, and were lucky enough to run across Mathias Kettner’s fantastic MK Livestatus module. We consolidated eight different tools into a single one, added richer functionality in terms of querying Nagios, and put away mysterious failures we had grown accustomed to live with, knowing status.dat parsing was biting us. We christened the new tool the Nagios Shell, since it was intended to run on the CLI, and that opened an entire new set of functionality and correctness in managing our environment.

The current incantation is comprised of two scripts, ngsh (a shell script) and ngsq (Python script), and requires that you build MK Livestatus into your Nagios instance. A new generation is on the works, one which replaces this mixture with a toolkit written entirely in Ruby and provides far more flexibility than the current one, including a RESTish interface so that Nagios can be controlled over a HTTP interface (more on that soon). The README has some brief examples of usage, and soon the wiki will contain a roadmap of improvements.

 

Zettabee and Theia

Posted by gerir on October 21, 2011 – 2:13 PM

It’s hard to believe it has almost a year since we started the process of open sourcing tools, but it has indeed been that long, and it picked up steam a few weeks ago, when pushed out nddtune, which is admittedly a very simple tool. Today we’re continuing that effort with a couple of more significant tools: Zettabee and Theia.

A Little History

About four years ago, we had a very real need to have fairly detailed performance metrics for NetApp filers. At the time, the available solutions relied on SNMP (NetApp’s SNMP support has historically been weak) or were NetApp’s own, which, asides from expensive, were hard to integrate with the rest of our monitoring infrastructure (which is comprised of Nagios and Zenoss). As such, we set out to write a tool that would both perform detailed filer monitoring (for faults and performance) and that would be able to interface with those systems. Theia was born.

In more recent times, as we were looking at beefing up our DR strategy, we found ourselves needing a good ZFS-based replication tool, and set out to write Zettabee, which gave us an opportunity to dive deeper into ZFS capabilities.

Let the Games Begin

Today we’re very excited to be releasing those two tools into the open. Theia has been in production for the last four years, dutifully keeping an eye on our filers, while Zettabee has been pushing bits long-distance for well over nine months. We are working on putting together a roadmap for future work, but are happy to have them out in the open for further collaboration. Tim has written a good post on some of the work he has done to make this happen, and I am grateful for his help on this endeavor.

JRuby & Sinatra web service as an executable jar

Posted by tomdz on September 20, 2011 – 12:15 PM

Recently I was working on a web service that uses Sinatra with JRuby. Now JRuby running on the JVM and all, I was thinking it would be nice to neatly bundle everything up in one jar and give that to java to run. No JRuby would need to be installed, no need for rvm or anything, only java required. Some quick googling found this article from Yoko Harada which, while not exactly what I needed, gave a lot of good hints. The most relevant difference is that in my case, the application is already using Maven to build instead of rake, and I didn’t want to introduce a second build tool. If you are already using rake, then you can use tools that make this bundling in a jar quite easy, e.g. Warbler or Rawr.

For Maven on the other hand a little bit more manual work is required, which I’m going to show you with a simple hello world example.

Let’s get started with this simple Sinatra Hello World app:

require 'rubygems'
require 'sinatra'

get "/" do
"Hello World"
end

Now one of the goals was to run this without requiring that JRuby is installed. Let’s apply the same constraint to the project setup. Instead of a local JRuby installation, we’ll instead use jruby-complete.jar directly. First, to install it, we’ll use Maven:

<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0
http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>org.tomdz</groupId>
<artifactId>sinatra-test</artifactId>
<version>1.0.0-SNAPSHOT</version>
<packaging>pom</packaging>
<name>sinatra-test</name>
<dependencies>
<dependency>
<groupId>org.jruby</groupId>
<artifactId>jruby-complete</artifactId>
<version>1.6.4</version>
</dependency>
</dependencies>
</project>

Running Maven with this pom.xml

$ mvn install

will give us the jruby-complete.jar in the local maven repositiory, usually at ~/.m2/repository/org/jruby/jruby-complete/1.6.4/jruby-complete-1.6.4.jar.

Next, we need to initialize rubygems and also install Bundler to keep things simple. A normal Ruby (1.8) installation has libraries and gems in a directory structure like this:

lib
+- jruby
   +- 1.8
      +- bin
      +- cache
      +- doc
      +- gems
      +- specifications

We are going to keep this structure, but in the local project folder instead of wherever (J)Ruby is installed. To install Bundler (and OpenSSL), we can now use the local jruby-complete.jar file like so:

$ java -jar ~/.m2/repository/org/jruby/jruby-complete/1.6.4/jruby-complete-1.6.4.jar \
       -S gem install bundler jruby-openssl \
              --no-ri --no-rdoc \
              -i lib/jruby/1.8/

Fetching: bundler-1.0.18.gem (100%)
Successfully installed bundler-1.0.18
Fetching: bouncy-castle-java-1.5.0146.1.gem (100%)
Fetching: jruby-openssl-0.7.4.gem (100%)
Successfully installed bouncy-castle-java-1.5.0146.1
Successfully installed jruby-openssl-0.7.4
3 gems installed

This is actually cheating a little bit because JRuby will find the gem script in the current path, not in the jar. Unfortunately there doesn’t seem to be a way to make JRuby look for it in the jar at the moment, so we have to rely on the system’s Ruby installation for now. Since the gem script is a simple wrapper around the GemRunner class which will be loaded from the jar, this usually works out ok. This only becomes a problem if there is no Ruby installed on the machine where this is executed, or if the Ruby installation is quite old. In those cases, we could execute the GemRunner class directly:

$ java -jar ~/.m2/repository/org/jruby/jruby-complete/1.6.4/jruby-complete-1.6.4.jar \
       -rrubygems \
       -e "require 'rubygems/gem_runner'; Gem::GemRunner.new.run 'env'.split"

(replace the env string with whatever arguments you want to pass to gem).

For our webapp, we also need to install Sinatra itself, for which we’ll use Bundler. First, we need a Gemfile in the project root:

source 'http://rubygems.org'
gem 'sinatra'

Since we are lazy, we’ll add a script to invoke Bundler in the same way that we invoked gem above:

#!/bin/bash
GEM_PATH=`pwd`/lib/jruby/1.8 java \
-jar ~/.m2/repository/org/jruby/jruby-complete/1.6.4/jruby-complete-1.6.4.jar \
-S lib/jruby/1.8/bin/bundle install --path lib/

Note the GEM_PATH part. This sets the gem path for JRuby for that invocation, so that it will find the Bundler gem.

With this script, we can now install Sinatra:

$ chmod +x bundler.sh
$ ./bundler.sh

Fetching source index for http://rubygems.org/
Installing rack (1.3.2)
Installing tilt (1.3.3)
Installing sinatra (1.2.6)
Using bundler (1.0.18)

Since we opted to install Sinatra via Bundler, we should also require bundler/setup in our ruby script:

require 'rubygems'
require 'bundler/setup'
require 'sinatra'

get "/" do
"Hello World"
end

Save this script in src/main/ruby (to keep with Maven’s suggested directory layout). The project should look like this now:

Gemfile
Gemfile.lock
bundler.sh
pom.xml
src
+- main
   +- ruby
      +- server.rb
lib
+- jruby
   +- 1.8
      +- bin
      +- cache
      +- doc
      +- gems
         +- bouncy-castle-java-1.5.0146.1
         +- bundler-1.0.18
         +- jruby-openssl-0.7.4
         +- rack-1.3.2
         +- sinatra-1.2.6
         +- tilt-1.3.3
      +- specifications

Let’s test that this works:

$ GEM_PATH=`pwd`/lib/jruby/1.8 java \
    -jar ~/.m2/repository/org/jruby/jruby-complete/1.6.4/jruby-complete-1.6.4.jar \
    src/main/ruby/server.rb 

== Sinatra/1.2.6 has taken the stage on 4567 for development with backup from WEBrick
[2011-09-11 13:10:35] INFO  WEBrick 1.3.1
[2011-09-11 13:10:35] INFO  ruby 1.8.7 (2011-08-23) [java]
[2011-09-11 13:10:38] INFO  WEBrick::HTTPServer#start: pid=98483 port=4567

So far so good. Now for the jar part. JRuby has had the ability to load gems and scripts from a jar since at least 1.1.6. Nick Sieger blogged about the steps a while back. The jar basically needs to contain this directory structure:

bin
cache
doc
gems
specifications
META-INF
server.rb
**/*.class
**/*.rb

The bin, cache, doc, gems, and specifications folders come as-is from the lib/jruby/1.8 folder. The **/*.class stands for directories containing class files (e.g. compiled Java or JRuby).

In order to achieve this, we’re going to use the assembly plugin. There are other ways to generate a jar file (e.g. the jar or shade plugins), but the assembly plugin gives us the most control over the contents of the generated file.

First, we need to add a plugin section for the assembly plugin at the end of the pom.xml:

...
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-3</version>
<executions>
<execution>
<id>assemble</id>
<goals>
<goal>single</goal>
</goals>
<phase>package</phase>
<configuration>
<appendAssemblyId>false</appendAssemblyId>
<descriptors>
<descriptor>assembly.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

We are using version 2.2-beta-3 here because newer versions seem to have some problems running with pre-3.0 Maven versions.

Next, the assembly.xml file describes how the jar file should be assembled:

<assembly>
<id>artifact</id>
<formats>
<format>jar</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<fileSet>
<directory>${basedir}/src/main/ruby</directory>
<outputDirectory>/</outputDirectory>
</fileSet>
<fileSet>
<directory>${basedir}/lib/jruby/1.8</directory>
<outputDirectory>/</outputDirectory>
</fileSet>
</fileSets>
<dependencySets>
<dependencySet>
<outputDirectory>/</outputDirectory>
<outputFileNameMapping></outputFileNameMapping>
<unpack>true</unpack>
<unpackOptions>
<excludes>
<exclude>META-INF/MANIFEST.MF</exclude>
</excludes>
</unpackOptions>
</dependencySet>
</dependencySets>
</assembly>

This tells the assembly plugin to put files/directories from src/main/ruby and lib/jruby/1.8 into the root of the jar, and also unpack and then include all dependencies (i.e. JRuby) at the root.

The META-INF/MANIFEST.MF file that we tell the assembly plugin to exclude, tells Java various things about the jar, one of which is the main class to invoke if we run the jar via java -jar. In the case of JRuby, that will invoke the JRuby main in the same way as jruby would for a normal JRuby installation. We are excluding it here since we will user a different main class below.

Building this via

$ mvn clean install

will give us a jar that looks exactly like what we need. However since we excluded the manifests, we can’t run this jar just yet:

$ java -jar target/sinatra-test-1.0.0-SNAPSHOT.jar 

Failed to load Main-Class manifest attribute from
target/sinatra-test-1.0.0-SNAPSHOT.jar

Maven will have added a META-INF/MANIFEST file into the jar by itself, but it is basically useless for our purpose:

Manifest-Version: 1.0
Archiver-Version: Plexus Archiver
Created-By: 20.1-b02-383 (Apple Inc.)

Instead, we are going to use JRuby’s jar bootstrap mechanism described here. For this, we need a file called jar-bootstrap.rb at the root of the jar. We could either add a new file which then loads/requires our server.rb, or since our project is rather simple, we can simply rename the server.rb file:

$ mv src/main/ruby/server.rb src/main/ruby/jar-bootstrap.rb

We also need to tell the assembly plugin about our main class:

...
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.2-beta-3</version>
<executions>
<execution>
<id>assemble</id>
<goals>
<goal>single</goal>
</goals>
<phase>package</phase>
<configuration>
<archive>
<manifest>
<mainClass>org.jruby.JarBootstrapMain</mainClass>
</manifest>
</archive>
<appendAssemblyId>false</appendAssemblyId>
<descriptors>
<descriptor>assembly.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

After we generated the new jar:

$ mvn clean install

We can now run it:

$ java -jar target/sinatra-test-1.0.0-SNAPSHOT.jar 

== Sinatra/1.2.6 has taken the stage on 4567 for development with backup from WEBrick
[2011-09-11 13:27:00] INFO  WEBrick 1.3.1
[2011-09-11 13:27:00] INFO  ruby 1.8.7 (2011-08-23) [java]
[2011-09-11 13:27:03] INFO  WEBrick::HTTPServer#start: pid=98783 port=4567

On a *nix system, we can even go one step further and create a self-executable jar using Brian’s trick.

Create a file src/main/sh/run.sh with the invocation commandline:

#!/bin/bash
java -jar "$0" "$@"

Make sure to have a couple of newlines at the end of the file. Then add this plugin section at the end after the assembly plugin declaration:

...
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>shell-maven-plugin</artifactId>
<version>1.0-beta-1</version>
<executions>
<execution>
<id>make-executable</id>
<phase>package</phase>
<goals><goal>shell</goal></goals>
<configuration>
<workDir>${baseDir}</workDir>
<chmod>true</chmod>
<keepScriptFile>false</keepScriptFile>
<script><![CDATA[
cat "${basedir}/src/main/sh/run.sh" > "${project.build.directory}/server"
cat "${project.build.directory}/${project.build.finalName}.jar" >> "${project.build.directory}/server"
chmod +x "${project.build.directory}/server"
]]></script>
</configuration>
</execution>
</executions>
</plugin>

That gives us an executable called server that we can run like this:

$ ./target/server 

== Sinatra/1.2.6 has taken the stage on 4567 for development with backup from WEBrick
[2011-09-11 13:34:07] INFO  WEBrick 1.3.1
[2011-09-11 13:34:07] INFO  ruby 1.8.7 (2011-08-23) [java]
[2011-09-11 13:34:07] INFO  WEBrick::HTTPServer#start: pid=99118 port=4567

And that’s it, an executable jar that doesn’t need a JRuby environment. Almost of all the stuff that we’ve been talking about here, is boiler-plate code, independent of whether we use Sinatra or something else. That makes it straightforward to add this to any JRuby application that is invoked on the commandline.

Operations Toolkit

Posted by gerir on September 13, 2011 – 11:20 AM

A few months ago (has it been that long already?!) we started the process of pushing some of our internal Operations toolkit out in the open (and you can rightly argue that we barely dipped our toes in the water). We are picking up where we left off, and are working towards releasing several other tools over the next few weeks, some of them trivial (let’s call them utilities), others far more significant (true tools).

We are starting today on the utility end of the spectrum with nddtune, a SMF manifest/method combo you can use to ensure ndd tweaks that are not configurable via /etc/system (in Solaris) stay tweaked when a system reboots. See the README file for details. This is based on Dr. Hung-Sheng Tsao’s SMF and tcp tuning 2008 blog post, primarily adding a configuration file.

I chose a trivial utility to start with because the aim of this post is really to provide a preview of some of the tools we are working on publishing. Part of the work involves fixing some of the known bugs we have identified, as well as removing or improving code that is not generic and assumes the tool is running in Ning’s environment. Specific blog posts will follow as we make releases.

A sample of the upcoming tools, and in no particular order:

  • The Nagios Shell sits halfway in between utility and tool. We have traditionally not used any of the Nagios graphical interfaces but still need a way to interact with Nagios in a sane fashion. After a couple of years of writing a hodge-podge of utilities, we decided to collapse the functionality into a single tool, which uses Matthias Kettner’s incredible MK-Livestatus module to provide far more of the functionality that we had before. The current incantation of the Nagios Shell is actually a shell script coupled with a Python script, and we’re in the middle of a from-the-ground-up rewrite in Ruby to add further funcionality while removing complexity.
  •  Zettabee is definitely in tool territory. We use ZFS storage for a variety of purposes, and the bulk of the data we store on ZFS has to be replicated to other facilities. Zettabee encapsulates and manages zfs send and receive functionality to provide incremental, block-level, asynchronous replication of remote ZFS file systems (no support for synchronous operations is available), and it’s tightly (but optionally) integrated with Nagios as well.
  • Theia provides NetApp filer performance monitoring and alerting through its integration with Nagios and Zenoss. One of the oldest tools in our toolkit, it has been in production for well over four years, constantly prodding and poking our filers to extract performance data and alert as necessary. It used to be that monitoring NetApp filers was a) painful and/or b) expensive (DFM anyone?). Theia takes care of that for us.

There are other tools in the pipeline, but these will have to do for now. It is our sincere hope that they are useful outside of our environment, and hopefully, other bright coders out there can crank out additional fixes and functionality that we have not implemented. And don’t forget to check out the rest of Ning’s open source projects on GitHub!

3 Steps to Understanding Ungrokkable Legacy Code

Posted by Jonathan Aquino on August 29, 2011 – 9:13 AM

On Friday I was faced with making some changes to some old code that nobody really understood. Nobody liked to go into this code, so nobody was familiar with it. I too dreaded to make changes to it, and only dipped my toes in it as little as possible to understand what I needed to know.

Well now I needed to understand it better in order to make a bug fix. It turned out to be a good opportunity to learn what this code was doing and make it easier to understand.

Here are some tips for understanding legacy code that is hard to grok:

Step 1. Print out the code. Sometimes the code you face is so gnarly that you just have to print out the sucker. The code needs to be printed out onto hardcopy so you can lay it out on a table and get a sense of what it is trying to do. So I printed it out. The code itself wasn’t very long – about 10 pages or so – but it was extremely confusing. When I laid the pages out on the table, though, I could start to get a handle on it. I could make connections between the 3–deep class hierarchy – I could see what was overriding what.

Step 2. Tidy up the code. Tidying up whitespace and fixing the style of the code is a great, low–investment way to get familiar with the code. This is a tip I picked up from the interview with Douglas Crockford in the book Coders At Work (fascinating book of interviews with famous coders, btw):

Seibel: “How do you read code you didn’t write?”

Crockford: “By cleaning it. I’ll throw it in a text editor and start fixing it. First thing I’ll do is make the punctuation conform; get the indentation right, do all that stuff.”

So fix up the superficial things, just so you can start getting your hands (a little) dirty working with the code.

Step 3. Make the code easier for yourself and others to understand. What I mean here is adding doc, and especially renaming variables, methods, and classes to be easier to grok. For example, renaming $compiled to $configValues made a quantum difference in the understandability of the code, and I made 13 other renames like that. There is power in names – I have a hunch that design is nothing more than the art of precise naming (if you know of a paper, perhaps in linguistics, that backs up that statement, I’d be interested – let me know).

Having printed out the code, tidied it up a little, and especially documented it and renamed things to be easier to understand, I now understood the dragon that we had long feared, and made it grokkable for my teammates as well. I knew how to fix it, what part of the code I needed to change – that took about 20 minutes, including writing a unit test.

It feels good to slay dragons.

What tips do you have for dealing with obtuse legacy code?

Sidekick – Using Node.js to run scheduled tasks for a service

Posted by David Sklar on August 8, 2011 – 10:04 AM

The Problem

We run PHP inside of Apache 2. This works great for servicing user requests, but that very request/response nature of the PHP setup makes it difficult to do things such as:

  • run PHP code after Apache starts up to initialize server state (such as populating APC cache with data or compiled code) before the server indicates it’s ready to handle real requests
  • run periodic or scheduled tasks inside the server, such as announcing the server to our service discovery system or refreshing local caches of remote information

The Solution

We run a little companion process (the “sidekick”) that is started up at the same time as httpd (and stopped when httpd is stopped). It reads a configuration file which tells it what tasks to execute on what schedule.

The configuration file specifies each task as a URL to execute on the server and the frequency of execution. For example:

"announce": {
  "url": "/xn/tasks/frobnicate",
  "frequency": 60
}

This tells sidekick to issue a GET to /xn/tasks/frobnicate on the local httpd every 60 seconds.

Why This Was Nice To Build With Node.js

Using Node.js for this “sidekick” companion process was useful for a few reasons:

  • Most importantly, the event-based nature of Node.js took care of the timing and scheduling aspects of the sidekick. Scheduling each task execution is a simple setTimeout() call. Avoiding two instances of a task running at once requires only checking a single object property. Long-running tasks don’t affect the scheduling of other tasks. I can mostly just fire off the tasks when I want them and rely on Node’s internal event loop to make the HTTP requests and invoke my callbacks when appropriate.
  • Executing the requests back to the main HTTP server was dead simple with Node’s http module. Making requests, handling responses, and dealing with errors are all straightforward. The only code I needed to write was my application-specific logic. I didn’t need to spend any time on boilerplate connection handling mechanics.
  • Node’s signal handling and message passing on process exit made it straightforward to take special actions before shutdown, such as removing a PID file and executing a special request against the main HTTP server.
  • Using Javascript made the configuration file specification trivial (admittedly, this is not a property unique to Javascript) but also will allow for very easy extension into specifying task logic itself in the config file. (More details on this in the “What’s next” section below.)

Other Features

In addition to the functionality specified above, the configuration file supports the following options:

If a task’s frequency is set to initialize, then the task is treated specially as the “initialization” task. This means it gets run first at process startup. All timed tasks are not run until the first run of the initialization task completes. The initialization task does not run on any regular schedule but can be re-executed by sending SIGUSR2 to the sidekick. We use this feature to prime PHP caches which are cleared when the main HTTP server is gracefully restarted. (Apache uses SIGUSR1 for this function but SIGUSR1 is claimed by Node to activate its debugger.)

If a task’s frequency is set to shutdown, then the task is treated specially as the “shutdown” task. This means it is only run on process shutdown. This is useful for cleaning up resources or making notifications. We use this to have the server remove itself from our service discovery system.

A task can have a start-delay key which indicates the number of seconds to wait before kicking off the first run of the task. This is useful for staggering the execution of multiple tasks. Instead of having ten tasks wake up every 60 seconds together and fire off their requests, you could stagger them each by one or two seconds to spread out the load.

A tasks can have a dedicated-child key with a boolean value (defaults to false if this key is not present) indicating whether sidekick should spawn a separate process to execute the task in. This could be useful if the task response is large, might somehow crash Node, or if the task execution itself (once the “specify task logic as Javascript in the config file” expansion described below is complete) is CPU intensive.

What’s next?

Task Enhancements

Tasks are just GET requests. There are some use cases for which it would be nice to extend this to other methods and perhaps allow specification of other attributes of the request (body, headers).

Having all tasks just be URL callbacks into the server works great for Apache and PHP because the PHP code behind the request can execute whatever we need it to. But this approach is not so useful when wrapping other servers without such easy programmability. (For example, we are exploring doing the same sidekick wrapping for Redis.) In that case, a useful extension to sidekick would be the ability to specify Javascript code, not just URLs, to define a task to run. The tasks can then do anything that Node.js can do, such as talk to remote network services or do local filesystem cleanups.

Process Monitoring

If the sidekick dies, it should kill the main service. (And vice versa.) We get around this right now by having an external liveness checker poking the server every few seconds — if it finds that just the sidekick (or just the main service) is running, it kills it and reports a failure to our monitoring system. Ideally the sidekick could take care of this itself. When it dies, it could kill the main server; it could periodically check the health of the main server and kill itself if it finds the main server is dead.

Download the Code

You can download the sidekick at https://github.com/ning/sidekick.

PHP Tip: The identity function I()

Posted by Jonathan Aquino on July 8, 2011 – 10:20 AM

Here’s a tip I picked up from another PHP programmer. It’s called the identity function, and it simply returns its argument:

function I($subject) {
    return $subject;
}

Now what so great about that, you ask? One cool thing is that it allows you to instantiate an object and call a method on it, in a single statement:

I(new Foo())->bar();

I know – there’s a built-in way to do this in Java and other languages. Not so in PHP.

Another thing that you often want to do is to call a function that returns an array, then access the nth element of the returned array: $first = $foo->getItems()[0]. Again, in PHP, this is not something that you can do with a single statement. But if we expand our definition of the I() function, we can do it with:

$first = I($foo->getItems(), 0);

Here is the expanded definition of the identity function:

/**
 * The identity function - returns the argument. This is useful for instantiating
 * an object and using it on the same line: I(new Foo())->bar().
 *
 * You can also use it to access array values on the same line: I($user->getLikes(), 0).
 *
 * @param $subject mixed  the object, which will be returned
 * @param $key string  the key whose value to return
 * @return mixed  $subject if no key is specified; otherwise, $subject[$key]
 */
function I($subject, $key = null) {
    if ($key !== null) {
        if ($subject === null) {
            return null;
        }
        if (!is_array($subject)) {
            throw new Exception('Subject is not an array: ' . var_export($subject, true));
        }
        return $subject[$key];
    }
    return $subject;
}

Using the ub PHP benchmarking tool, you can see that there is a bit of overhead when using the I() function. So this isn’t something that you would want to use in a tight loop:

$ ub not-using-i.php
     not-using-i: mean=0.002162 median=0.002000 min=0.001000 max=0.041000 stdev=0.001435
$ ub using-i.php
         using-i: mean=0.002565 median=0.002000 min=0.002000 max=0.041000 stdev=0.001643

Hopefully the identity function saves you a few keystrokes.

What other little utility functions have you found useful?

Attend Tech Talks by Ning's Engineering & Ops teams at Ning HQ in downtown Palo Alto, CA!

Archives by Category

Search this Blog


RSS