Author Archive

uniqid() is slow

Posted by David Sklar on February 1, 2012 – 6:33 AM

So there I was, profiling one of our systems with XHProf and what jumped out as taking up lots of wall-clock execution time? uniqid(). This function, which provides a convenient way to generate unique identifiers, can hold up your PHP execution far more than it should if you’re not careful.

Two things about uniqid() make it a potential performance minefield:

  1. If not called with a second argument of TRUE (the “more entropy” argument), uniqid() sleeps for at least a microsecond.
  2. uniqid() relies on the system function gettimeofday() — and the Linux kernel essentially serializes access to gettimeofday() across all cores.

 

The Big Sleep

There are two funky things about sleeping from within uniqid(). First, why does PHP need to sleep at all? And how long does that sleep really take?

Without the “more entropy” flag set, uniqid() constructs the unique identifier entirely from the time returned from gettimeofday(): 8 hex digits from the seconds part of the time and 5 hex digits from the microseconds part of the time. So if two calls to uniqid() happen in the same microsecond, then each would return the same identifier. Not so unique!

To avoid this problem, uniqid() ensures that two calls from the same process can’t happen in the same microsecond by making sure uniqid() takes at least a microsecond to execute. The implementation of uniqid() calls usleep(1) to sleep for a microsecond, so the entire uniqid() execution will take at least that long.

In practice, though, that innocent looking usleep(1) can cause a delay of a lot more than just 1 microsecond. The man page for usleep() says “The usleep() function suspends execution of the calling thread for (at least) usec microseconds.”. That “at least” is no joke. On my test system, usleep(1) takes about 63 microseconds to execute. Depending on the resolution of the hardware clocks on your system, your results will vary.

Setting the “more entropy” flag changes this behavior. PHP skips the usleep(1) call and instead disambiguates potential same-microsecond collisions by appending data from php_combined_lcg() to the identifier. This function returns pseudorandom data; two successive calls to it from the same process will return different results.

gettimeofday() serialization

On Linux, a call to gettimeofday() from userspace ultimately ends up running the kernel code at http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=kernel/time/timekeeping.c#l217 :

do {
  seq = read_seqbegin(&xtime_lock);
  *ts = xtime;
  nsecs = timekeeping_get_ns();

  /* If arch requires, add in gettimeoffset() */
  nsecs += arch_gettimeoffset();
} while (read_seqretry(&xtime_lock, seq));

Those read_seqbegin() and read_seqretry() calls that wrap the call to timekeeping_get_ns() are read locks that force the caller (ultimately PHP calling gettimeofday()) to retry if a write is in progress. And since the time is updated by the kernel so frequently, writes are often in progress.

What this means in practice is that there is a relatively fixed upper bound on the number of gettimeofday() calls that can happen per second across the entire system. If you just have one processor/core, this is not such a big deal, since your single processor is spending its time doing plenty of other things besides calling gettimeofday(). But if you have multiple cores (the system I was profiling was running 8 cores) then the total number of calls across all cores is that same upper bound. So each core has to queue up to get its gettimeofday() calls done.

I wrote a simple program (gettimeofday-forker) which lets you measure how fast gettimeofday() calls are on your system. Download the source code and then compile it with gcc -o gettimeofday-forker gettimeofday-forker.c. Then, initially run it with just one child and an interval of 1000000 microseconds (1 second). On my Linux system, I get this output:

$ ./gettimeofday-forker 1 1000000
[-] All children running
[0] 638146 calls in 1000000 usec = 0.64 calls/usec

With 2, 4, and 8 children, I get:

$ ./gettimeofday-forker 2 1000000
[-] All children running
[1] 366003 calls in 1000000 usec = 0.37 calls/usec
[0] 365881 calls in 1000000 usec = 0.37 calls/usec
$ ./gettimeofday-forker 4 1000000
[-] All children running
[3] 183819 calls in 1000004 usec = 0.18 calls/usec
[2] 181205 calls in 1000001 usec = 0.18 calls/usec
[1] 183655 calls in 1000000 usec = 0.18 calls/usec
[0] 183866 calls in 1000001 usec = 0.18 calls/usec
$ ./gettimeofday-forker 8 1000000
[-] All children running
[7] 91727 calls in 1000002 usec = 0.09 calls/usec
[2] 91928 calls in 1000018 usec = 0.09 calls/usec
[1] 91727 calls in 1000001 usec = 0.09 calls/usec
[6] 90878 calls in 1000441 usec = 0.09 calls/usec
[0] 91600 calls in 1000029 usec = 0.09 calls/usec
[3] 91917 calls in 1000660 usec = 0.09 calls/usec
[4] 94734 calls in 1000001 usec = 0.09 calls/usec
[5] 95827 calls in 1000001 usec = 0.10 calls/usec

The total number of gettimeofday() calls remains about constant, even as the number of child processes increases.

If you experiment with gettimeofday-forker on your own, keep in mind that you should specify an interval large enough to make sure that the time to start up all the processes is very small compared to total runtime. If some children finish before other start, your measurements will be skewed.

Solution

Our dear friend uniqid() has some performance problems. What’s the solution? In our case, we just switched to using mt_rand(). The places we were using uniqid() didn’t really need global-for-all-time unique identifiers, just identifiers that were extremely unlikely to collide with other identifiers being generated around the same time.

If you need stronger guarantees of uniqueness, you have a few options. You could build your own IDs by combining things that are unlikely to collide and cheap to obtain. For example, combine IP address of server, process ID, time at start of request, and a per-request sequence number. (If you’re running PHP in a multithreaded environment, also include thread ID.) For example:

function my_uniqid() {
  static $counter = 0;
  static $pid = -1;
  static $addr = -1;

  if ($pid == -1) { $pid = getmypid(); }
  if ($addr == -1) { $addr = ip2long($_SERVER['SERVER_ADDR']); }

  return $addr . $pid . $_SERVER['REQUEST_TIME'] . ++$counter;
}

This gives you a nice string of digits that are each guaranteed to be generated only once, unless you reset your system clock or duplicate IP addresses.

If that’s not enough, take a look at the PECL uuid extension ), which wraps libuuid. It can generate UUIDs based on things such as Ethernet MAC address. It can be told to generate time-based UUIDs, though, which in turn use gettimeofday() so be careful to avoid that if you don’t want to run into the kinds of problems outlined above.

Sidekick – Using Node.js to run scheduled tasks for a service

Posted by David Sklar on August 8, 2011 – 10:04 AM

The Problem

We run PHP inside of Apache 2. This works great for servicing user requests, but that very request/response nature of the PHP setup makes it difficult to do things such as:

  • run PHP code after Apache starts up to initialize server state (such as populating APC cache with data or compiled code) before the server indicates it’s ready to handle real requests
  • run periodic or scheduled tasks inside the server, such as announcing the server to our service discovery system or refreshing local caches of remote information

The Solution

We run a little companion process (the “sidekick”) that is started up at the same time as httpd (and stopped when httpd is stopped). It reads a configuration file which tells it what tasks to execute on what schedule.

The configuration file specifies each task as a URL to execute on the server and the frequency of execution. For example:

"announce": {
  "url": "/xn/tasks/frobnicate",
  "frequency": 60
}

This tells sidekick to issue a GET to /xn/tasks/frobnicate on the local httpd every 60 seconds.

Why This Was Nice To Build With Node.js

Using Node.js for this “sidekick” companion process was useful for a few reasons:

  • Most importantly, the event-based nature of Node.js took care of the timing and scheduling aspects of the sidekick. Scheduling each task execution is a simple setTimeout() call. Avoiding two instances of a task running at once requires only checking a single object property. Long-running tasks don’t affect the scheduling of other tasks. I can mostly just fire off the tasks when I want them and rely on Node’s internal event loop to make the HTTP requests and invoke my callbacks when appropriate.
  • Executing the requests back to the main HTTP server was dead simple with Node’s http module. Making requests, handling responses, and dealing with errors are all straightforward. The only code I needed to write was my application-specific logic. I didn’t need to spend any time on boilerplate connection handling mechanics.
  • Node’s signal handling and message passing on process exit made it straightforward to take special actions before shutdown, such as removing a PID file and executing a special request against the main HTTP server.
  • Using Javascript made the configuration file specification trivial (admittedly, this is not a property unique to Javascript) but also will allow for very easy extension into specifying task logic itself in the config file. (More details on this in the “What’s next” section below.)

Other Features

In addition to the functionality specified above, the configuration file supports the following options:

If a task’s frequency is set to initialize, then the task is treated specially as the “initialization” task. This means it gets run first at process startup. All timed tasks are not run until the first run of the initialization task completes. The initialization task does not run on any regular schedule but can be re-executed by sending SIGUSR2 to the sidekick. We use this feature to prime PHP caches which are cleared when the main HTTP server is gracefully restarted. (Apache uses SIGUSR1 for this function but SIGUSR1 is claimed by Node to activate its debugger.)

If a task’s frequency is set to shutdown, then the task is treated specially as the “shutdown” task. This means it is only run on process shutdown. This is useful for cleaning up resources or making notifications. We use this to have the server remove itself from our service discovery system.

A task can have a start-delay key which indicates the number of seconds to wait before kicking off the first run of the task. This is useful for staggering the execution of multiple tasks. Instead of having ten tasks wake up every 60 seconds together and fire off their requests, you could stagger them each by one or two seconds to spread out the load.

A tasks can have a dedicated-child key with a boolean value (defaults to false if this key is not present) indicating whether sidekick should spawn a separate process to execute the task in. This could be useful if the task response is large, might somehow crash Node, or if the task execution itself (once the “specify task logic as Javascript in the config file” expansion described below is complete) is CPU intensive.

What’s next?

Task Enhancements

Tasks are just GET requests. There are some use cases for which it would be nice to extend this to other methods and perhaps allow specification of other attributes of the request (body, headers).

Having all tasks just be URL callbacks into the server works great for Apache and PHP because the PHP code behind the request can execute whatever we need it to. But this approach is not so useful when wrapping other servers without such easy programmability. (For example, we are exploring doing the same sidekick wrapping for Redis.) In that case, a useful extension to sidekick would be the ability to specify Javascript code, not just URLs, to define a task to run. The tasks can then do anything that Node.js can do, such as talk to remote network services or do local filesystem cleanups.

Process Monitoring

If the sidekick dies, it should kill the main service. (And vice versa.) We get around this right now by having an external liveness checker poking the server every few seconds — if it finds that just the sidekick (or just the main service) is running, it kills it and reports a failure to our monitoring system. Ideally the sidekick could take care of this itself. When it dies, it could kill the main server; it could periodically check the health of the main server and kill itself if it finds the main server is dead.

Download the Code

You can download the sidekick at https://github.com/ning/sidekick.

Cross-domain communication with HTML5 postMessage

Posted by David Sklar on April 8, 2011 – 9:33 AM

Plenty has been written about HTML5′s spiffy postMessage() method, which allows a window to send a text string to another window. When one of those windows is a web page in one domain, and another one of those windows is an iframe for a different domain embedded in the first window, you’ve got the basis for cross-domain communication that can work around the restrictions of traditional Ajax requests.

To take advantage of this capability, I put together a proof-of-concept jQuery plugin that makes this pretty simple. Check it out at http://github.com/ning/pomp. The goals of this plugin are:

  • Make it really easy to use postMessage to send Ajax requests from one domain to another
  • Allow a developer to use familiar jQuery style to send these requests and wire up callbacks
  • Have multiple different message target domains on the same page if desired

So take a look and let me know what you think!

Building Non-Java Stuff With Maven and Friends

Posted by David Sklar on March 4, 2011 – 7:26 AM

Our build and deployment toolchain is:

  • git for version control
  • Maven for packaging and dependency management
  • Nexus for artifact management
  • Pulse for automated build management
  • galaxy for deployment

While git, galaxy, and pulse are language-neutral, Maven is Java-centric. Nexus is language-neutral but its maven-centricity gives it some Java guilt-by-association.

Most of our software is written in Java, but we use components written in C as well: both externally-written packages such as Apache HTTPD and PHP but also internal libraries and extensions.

We try to use the same build and deployment toolchain no matter what language a project is written in. This document discusses how we build C-based stuff with Maven and friends.

For a given component we need to be able to:

  • track its source code
  • create deployable artifacts with particular version numbers
  • allow other components to rely on particular versions of the component
  • deploy an artifact with a particular version number

And cores should, of course, be deployable with Galaxy and support all the interoperation that implies.

Project Structure

libmemcached is an example of an external library we depend on. It is used as a dependency by the PECL memcached PHP extension, which our PHP execution environments use to access our memcached servers. The canonical location of libmemcached source code is https://code.launchpad.net/libmemcached.

We maintain an internal git repository of the libmemached source code. We don’t do anything fancy to import the Bazaar-managed repository to preserve history, we just copy the files. This approach works fine for us for projects like this where we’re not actively developing it, just using it. (For other external projects whose canonical repository is managed with git, we git clone to create our internal repository.)

The structure of our internal libmemcached repository is as follows:

  • pom.xml – Maven POM file describing the project and how to build it
  • assembly.xml – Maven Assembly Descriptor describing what to put in the project’s artifact
  • src/main/c – The project source code

There is no automated procedure for updating the libmemcached.git repository with a new version of the libmemcached code. When we want to build a new version of libmemcached, the new code is downloaded from the external repository, the contents of src/main/c replaced, the version number in pom.xml updated, and a new artifact created.

The project’s POM file describes to Maven how to configure and compile libmemcached. There are a number of elements in the POM file which are not present (or necessary) in typical POM files used for Java projects.

The <profiles/> section sets some variables that are architecture specific. Because we’re compiling C into object code, we need to have different artifacts created depending on operating system and architecture. The envClassifier variable, used later in the POM and in the Assembly Descriptor as well, is set to an os- and arch-specific value. The other variables set in the profiles are generally used to provide configuration or compilation flags used later in the POM. In particular the cflags variable specifies compiler settings that, on Linux, enable some stack protection and security features. The <profiles/> section of the pom looks like this:

  <profiles>
    <profile>
      <id>linux-64</id>
      <activation>
        <os>
          <family>unix</family>
          <name>linux</name>
          <arch>x86_64</arch>
        </os>
      </activation>
      <properties>
        <envClassifier>linux-${os.arch}</envClassifier>
        <use64bit>--enable-64bit</use64bit>
	<cflags>-fstack-protector -D_FORTIFY_SOURCE=2 -O3 -Wall</cflags>
      </properties>
    </profile>
    <profile>
      <id>linux-64-amd</id>
      <activation>
        <os>
          <family>unix</family>
          <name>linux</name>
          <arch>amd64</arch>
        </os>
      </activation>
      <properties>
        <envClassifier>linux-${os.arch}</envClassifier>
        <use64bit>--enable-64bit</use64bit>
	<cflags>-fstack-protector -D_FORTIFY_SOURCE=2 -O3 -Wall</cflags>
      </properties>
    </profile>
    <profile>
      <id>mac</id>
      <activation>
        <os>
          <family>mac</family>
        </os>
      </activation>
      <properties>
        <envClassifier>mac-${os.arch}</envClassifier>
        <use64bit>--enable-64bit</use64bit>
	<cflags>-O3 -Wall</cflags>
      </properties>
    </profile>
    <profile>
      <id>sun</id>
      <activation>
        <os>
          <family>unix</family>
          <name>sunos</name>
        </os>
      </activation>
      <properties>
        <envClassifier>sun-${os.arch}</envClassifier>
        <use64bit>--enable-64bit</use64bit>
	<cflags>-O3 -Wall</cflags>
      </properties>
    </profile>
  </profiles>

This lets us build Linux, Mac, and Solaris versions easily. While we mostly use Linux on our production servers, having Mac builds is nice for developer machines.

We define some properties in the POM that are used by various plugins to determine directory names and file locations:

  <properties>
    <!-- Where the compilation happens. Code is copied from src/main into this directory -->
    <workDir>${project.build.directory}/src</workDir>
    <!-- The staged install directory. The artifact is assembled (mostly) from stuff under this directory -->
    <installDir>${project.build.directory}/inst</installDir>
    <!-- An arbitrary (but unlikely to occur in nature) prefix string that the compiled stuff thinks
    it's installed under. This string will end up in various header files and configuration scripts
    and so is easily found + replaced. The files in src/main/build are intended to be used by other
    artifacts that depend on this one to do such replacements -->
    <prefix>/tmp/${project.artifactId}-${project.version}</prefix>
    <!-- Where dependencies get unpacked into -->
    <dependencyDir>${project.build.directory}/dependencies</dependencyDir>
    <!-- The directory that the memcached dependency will end up in -->
    <memcachedDir>${dependencyDir}/memcached-${envClassifier}-tar.gz</memcachedDir>
  </properties>

Libmemcached uses memcached in its build process, so memcached is declared as a dependency in the POM’s <dependency/> section. Combined with the use of the maven-dependency-plugin, this ensures we get an unpacked copy of memcached underneath our build directory for the libmemcached build process to use. The presence of the <classifier/> element in the dependency description ensures we get an os- and arch- appropriate version of memcached to use. The memcached dependecy declaration looks like this:

  <dependencies>
    <dependency>
      <groupId>ning.memcached</groupId>
      <artifactId>memcached</artifactId>
      <version>1.2.6</version>
      <type>tar.gz</type>
      <classifier>${envClassifier}</classifier>
    </dependency>
  </dependencies>

And then dependencies are unpacked with this invocation of the maven-dependency-plugin:

      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-dependency-plugin</artifactId>
        <configuration>
          <includeTypes>tar.gz</includeTypes>
          <excludeTransitive>true</excludeTransitive>
          <outputDirectory>${dependencyDir}</outputDirectory>
          <useSubDirectoryPerArtifact>true</useSubDirectoryPerArtifact>
          <stripVersion>true</stripVersion>
        </configuration>
        <executions>
          <execution>
            <id>unpack</id>
            <phase>initialize</phase>
            <goals>
              <goal>unpack-dependencies</goal>
            </goals>
          </execution>
        </executions>
      </plugin>

After dependencies are unpacked, a successful libmemcached build requires the following things to happen:

  • libmemcached source code put into a place where it can be built
  • The familiar configure, make, make install steps
  • Packaging up the results of “make install” as a deployable artifact

The first step (putting the source code into a place where it can be built) happens in the process-sources phase with the shell-maven-plugin. The shell script that this plugin runs copies the contents of src/main/c to a directory under target/, the standard Maven working directory for builds. Doing the configuration and compilation somewhere other than directly under src/main/c makes it easy to wipe out a partial build without affecting the code and keeps generated files out of the directories tracked by version control. This shell execution phase also needs to adjust permissions on some files in the unpacked memcached dependency because of a bug in the dependency plugin. (The plugin doesn’t preserve file permissions in the unpacked dependency.)

(Note that the actions taken by the shell plugin could also be done by the antrun plugin, but antrun has similar file-permission preservation problems. Running cp -a from the shell avoids that problem.)

The shell-maven-plugin section of the POM looks like this:

      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>shell-maven-plugin</artifactId>
        <version>1.0-beta-1</version>
        <executions>
          <!-- This execution cleans out the work dir and copies the source code in to it. -->
          <!-- Making ${memcachedDir}/bin/* executable is necessary until http://jira.codehaus.org/browse/MDEP-109 is fixed. -->
          <execution>
            <id>stage-sources</id>
            <phase>process-sources</phase>
            <goals><goal>shell</goal></goals>
            <configuration>
              <workDir>${workDir}</workDir>
              <chmod>true</chmod>
              <keepScriptFile>false</keepScriptFile>
              <script>
                rm -rf ${workDir}
                mkdir -p ${workDir}
                cp -a ${basedir}/src/main/c/* ${workDir}
                cd ${workDir}
                chmod 0755 ${memcachedDir}/bin/*
              </script>
            </configuration>
          </execution>
        </executions>
      </plugin>

Next, configure, make, and make install happens via standard use of the make-maven-plugin. Each phase gets some standard options set describing the directories to work in. For the configure goal, the <configureOptions> property enumerates arguments to be passed to the configure script. <configureEnvironment/> sets environment variables active during configure. Setting CFLAGS here means we don’t have to set it later when running make.

The make install phase uses the DESTDIR support for staged installs to install files not to the actual “prefix” directory, but the prefix directory under our target install directory. This keeps the build contained and doesn’t pollute the build system.

The make-maven-plugin section of the POM looks like this:

      <plugin>
        <groupId>org.codehaus.mojo</groupId>
        <artifactId>make-maven-plugin</artifactId>
        <version>1.0-beta-1</version>
        <executions>

          <execution>
            <id>configure</id>
            <phase>process-resources</phase>
            <goals><goal>configure</goal></goals>
            <configuration>
              <workDir>${workDir}</workDir>
              <destDir>${installDir}</destDir>
              <prefix>${prefix}</prefix>
              <configureOptions>
                <configureOption>--with-memcached=${memcachedDir}/bin/memcached</configureOption>
                <configureOption>${use64bit}</configureOption>
              </configureOptions>
              <configureEnvironment>
                <property>
                  <name>CFLAGS</name>
                  <value>${cflags}</value>
                </property>
              </configureEnvironment>
            </configuration>
          </execution>

          <!-- Run "make" -->
          <execution>
            <id>compile</id>
            <phase>compile</phase>
            <goals><goal>compile</goal></goals>
            <configuration>
              <workDir>${workDir}</workDir>
              <prefix>${prefix}</prefix>
              <destDir>${installDir}</destDir>
            </configuration>
          </execution>

          <execution>
            <id>make-install</id>
            <phase>prepare-package</phase>
            <goals><goal>make-install</goal></goals>
            <configuration>
              <workDir>${workDir}</workDir>
              <destDir>${installDir}</destDir>
            </configuration>
          </execution>

        </executions>
      </plugin>

After everything is compiled and installed to target/inst (because that’s the value of the ${installDir} variable set in the <properties/> section and provided to the make-install goal as the <destDir/>), the maven-assembly-plugin takes over to assemble the buildable artifact. It uses the rules in the assembly.xml file to determine what to include. For libmemcached, this is pretty simple – we want everything that was installed. Setting <id/> to ${envClassifier} in assembly.xml ensures that the appropriate os- and arch-specific classifier is included in the artifact file name.

The maven-assembly-plugin is invoked from the POM like this:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-assembly-plugin</artifactId>
  <executions>
    <execution>
      <id>assemble</id>
      <goals>
        <goal>single</goal>
      </goals>
      <phase>package</phase>
      <configuration>
        <descriptors>
          <descriptor>assembly.xml</descriptor>
        </descriptors>
      </configuration>
    </execution>
  </executions>
</plugin>

And we use this simple assembly.xml:

<assembly>
  <id>${envClassifier}</id>
  <formats>
    <format>tar.gz</format>
  </formats>
  <includeBaseDirectory>false</includeBaseDirectory>
  <fileSets>
    <!-- Include everything that "make" installed -->
    <fileSet>
      <directory>target/inst${prefix}</directory>
      <outputDirectory>/</outputDirectory>
    </fileSet>
  </fileSets>

This instructs maven to build an artifact with a name like libmemcached-0.45-mac-x86_64.tar.gz or libmemcached-0.45-linux-amd64.tar.gz, and that’s what gets uploaded to Nexus.

Path Adjustments

The DESTDIR mechanism makes it easy to stage an install and package up the results. But sometimes the build process generates files which embed build-time pathnames that almost certainly will be different when the artifact is unpacked and used as a dependency. Both Apache HTTPD and PHP have this issue. Their artifacts (and the things that depend on them) use a convention for path-adjustment scripts to solve this problem.

For example, our local Apache HTTP repository contains a src/main/build directory with two files in it: adjust.sh and adjust-files.txt. adjust.sh is a script that should be run by a component which depends on httpd:

#!/bin/bash

if (( $# < 3 )); then
    echo "Usage: $0 BUILD_PREFIX INSTALLED_DIR FILE1 [FILE2 ... FILEn]"
    exit 1
fi

BUILD_PREFIX=$1
INSTALLED_DIR=$2
INDEX=1

for f in $@; do
    if (( $INDEX > 2 )); then
        sed "s:${BUILD_PREFIX}:${INSTALLED_DIR}:g" < $f > $f.tmp && mv $f.tmp $f
    fi
    ((INDEX++))
done

And adjust-files.txt is a list of files, used by adjust.sh to determine which files to adjust:

bin/apxs
build/config_vars.mk
bin/envvars

These get used, for example, in our build of PHP, which relies on Apache HTTP. The pom.xml in our internal repository for PHP contains some variable definitions that describe its dependency on httpd:

<properties>
<!-- Where dependencies get unpacked into -->
  <dependencyDir>${project.build.directory}/dependencies</dependencyDir>
  <!-- The directory that the httpd dependency will end up in -->
  <httpdDir>${dependencyDir}/httpd-${envClassifier}-tar.gz</httpdDir>
  <httpdVersion>2.2.14</httpdVersion>
</properties>

And the dependency itself:

<dependency>
  <groupId>org.apache</groupId>
  <artifactId>httpd</artifactId>
  <version>${httpdVersion}</version>
  <type>tar.gz</type>
  <classifier>${envClassifier}</classifier>
</dependency>

Inside the configuration of the shell-maven-plugin, however, there’s an additional execution to run in the process-sources phase (additional over the source-code-copying script that is similar to what appears in the libmemcached POM.)

This script execution invokes httpd’s adjust.sh script in order to replace pathnames in the listed files with the appropriate pathname for wherever the dependency has been unpacked to:

<!-- This execution runs the "adjust.sh" script bundled with the httpd dependency
     to adjust pathnames in its configuration files to point to the place it's been unpacked.
     It also needs to make bin/apxs (in the httpd dependency) executable, since the dependency
     plugin has a bug which doesn't preserve file attributes on unpacked files. -->
<execution>
  <id>adjust-dependencies-httpd</id>
  <phase>process-sources</phase>
  <goals><goal>shell</goal></goals>
  <configuration>
    <!-- CWD for the script should be where httpd was unpacked -->
    <workDir>${httpdDir}</workDir>
    <chmod>true</chmod>
    <keepScriptFile>false</keepScriptFile>
    <!-- making bin/apxs executable is necessary until http://jira.codehaus.org/browse/MDEP-109 is fixed -->
    <script>
      chmod 0755 ${httpdDir}/build-dependency/adjust.sh
      ${httpdDir}/build-dependency/adjust.sh /tmp/httpd-${httpdVersion} ${httpdDir} `cat ${httpdDir}/build-dependency/adjust-files.txt`
      chmod 0755 ${httpdDir}/bin/apxs
    </script>
  </configuration>
</execution>

The first line of the script makes the adjustment script executable (there’s that preserving-permissions-on-extracted-dependency-files bug again) and the second runs adjust.sh. The command line arguments tell it to replace /tmp/httpd-${httpdVersion} (what it thinks was the installed prefix, since that’s what was specified in httpd’s POM) with ${httpdDir} (the actual directory the httpd dependency has been unpacked into). The remaining command line arguments, slurped in from adjust-files.txt, are the files to do the replacement in.

The structure of this adjustment setup means that packages which need adjustment and packages that depend on them can operate in a standardized manner.

Packages which need adjustment just have to do three things:

  • put that adjust.sh (unmodified) into their src/main/build directory
  • list the files that need path adjustment in src/main/build/adjust-files.txt
  • make the adjustment files part of their artifact by including, in assembly.xml, a <fileSet> like this one:
    <fileSet>
      <directory>src/main/build</directory>
      <outputDirectory>/build-dependency</outputDirectory>
    </fileSet>
    

Packages which depend on an adjustment-needing package just have to run the adjust.sh script as described above.

Patches

External packages may not be updated with changes or fixes as fast as we want them to. So it’s handy to be able to patch the external source before we compile it.

The translit PHP extension is a good example of this. The last release, 0.6.0, was in April 2008. We’ve reported two bugs since then that haven’t been fixed. The POM for translit in our internal repository uses maven-patch-plugin to apply two patches that fix these bugs:

<plugin>
  <groupId>org.apache.maven.plugins</groupId>
  <artifactId>maven-patch-plugin</artifactId>
  <executions>
    <execution>
      <id>patch-sources</id>
      <phase>process-sources</phase>
      <goals><goal>apply</goal></goals>
      <configuration>
        <targetDirectory>${workDir}</targetDirectory>
        <patches>
          <!-- Fix http://pecl.php.net/bugs/bug.php?id=13687 -->
          <patch>bug-13687-ucs-2-le.patch</patch>
          <!-- Fix http://pecl.php.net/bugs/bug.php?id=15326 -->
          <patch>bug-15326-empty-strings.patch</patch>
        </patches>
      </configuration>
    </execution>
  </executions>
</plugin>

The patch files are in the src/main/patches directory of the repo, the standard place that maven-patch-plugin looks.

Pulse

Our Pulse projects for these cores are be configured slightly differently than standard pulse projects for Java-based cores. The specific changes required are:

  • not capturing a “test reports” artifact from the build
  • only capturing “.tar.gz” artifacts from the build (No .jars)
  • adding additional build stages that specify appropriate required resources so that builds execute on different agents representing all the OSes we want to build on (generally just Linux and Mac)

Otherwise Pulse behaves the same way for these projects as it does for others — we get the same automatic builds, logging and reporting.

Conclusion

Using maven in this way lets us re-use the parts of the infrastructure that makes sense for non-Java projects: version control integration, dependency management, automated builds, artifact management. Appropriate maven plugins (and shell-scripty glue when necessary) keep the C-specific stuff familiar, so the normal autotools build lifecycle can be preserved.

Fast multiple string replacement in PHP

Posted by David Sklar on September 29, 2010 – 8:09 AM

We added a language filter to Ning Pro last month. It lets Network Creators have naughty words (for the Network Creator’s definition of “naughty”) replaced with * characters.

A straightforward way to do this in PHP is to pass an array of words to look for and their replacements to a function like str_replace() or str_ireplace(). Or, similarly, use a regular expression that gloms the search terms together (and potentially checks word boundaries.) There are assorted WordPress plugins that work like this.

The problem with this approach is that it’s really slow. Especially if you have a lot of words you’re looking for. The amount of time it takes to do the search and replace grows in proportion to the number of words you’re looking for. This is particularly unfortunate because usually, none of the words are ever found!

For our language filter, we took a different approach. We’ve packaged it up into a PHP extension called Boxwood and releasing it today as open source. (Find it on github: http://github.com/ning/boxwood.)

With Boxwood, you can have your list of search terms be as long as you like — the search and replace algorithm doesn’t get slower with more words on the list of words to look for. It works by building a trie of all the search terms and then scans your subject text just once, walking down elements of the trie and comparing them to characters in your text. It supports US-ASCII and UTF-8, case-sensitive or insensitive matching, and has some English-centric word boundary checking logic.

Take it for a drive and let us know what you think!

PHP Microbenchmarking

Posted by David Sklar on May 4, 2010 – 9:06 AM

I’m pleased to announce the release of ub, a PHP microbenchmarking framework. You can download it from http://github.com/ning/ub.

The goal is to make it as easy as possible to compare the runtime of alternative approaches to the same problem, such as different regular expressions, or different methods for string or array manipulation.

The source distribution contains a README with some documentation and a bunch of sample benchmarks.

For normal use, it is rare that two similar, but different approaches produce appreciable differences in runtime. (Inefficient regexes and bloated call stacks aside.) The payoff from this kind of benchmarking is really on operations that happen hundreds or thousands of times in a request, or are happening on hundreds or thousands of servers. At that point, shaving off small amounts of runtime performance can really make a difference.

I am looking forward to beef up the set of included benchmarks — contributions are welcome!

Benchmarking Javascript

Posted by David Sklar on February 19, 2010 – 10:43 AM

There are several popular JS benchmarks out there, such as the SunSpider benchmark suite that’s part of the WebKit project and the V8 Benchmark suite.

These benchmarks cover a variety of tasks that are a mix of typical-things-a-browser-might-need-to-do and assorted CPU-heavy activities.

For our purposes, though, we wanted a measure something closer to typical server-side activity in generating a web page. Knowing that a Javascript engine is fast at raytracing doesn’t necessarily tell you how it’ll perform when slapping together a bunch of strings to build an HTTP response.

So we made a benchmark that does the following:

  • creates some text “comments” in an in-memory content cache
  • dispatches a request to a controller’s action method
  • the request loads some entries from the in-memory content cache
  • the request supplies those entries to a template to generate a page that lists the comments

You can download the code for the benchmark and instructions on how to run it.

In this benchmark, V8 is consistently much faster (4 – 6x) than the other two Javascript engines we tested. A comparable workload in PHP comes in slower than V8 but faster than either Rhino or Spidermonkey.

Warmup (running some test iterations before starting the clock) didn’t affect V8, PHP, or Spidermonkey performance very much but definitely helped Rhino out.

The picture below shows the results for a run with 100 iterations of the main benchmarking loop. Each iteration generated ten comment lists of 20 comments each. The y-axis shows iterations per second, the x-axis shows how many warmup iterations were run before the timing runs started. The top line (green) is V8, the next line (orange) is PHP, the red line is Spidermonkey, and the blue line is Rhino.

These data come from running the benchmaks on a 2×2.66 GHz Dual-Core Xeon Mac Pro running OS X 10.6.2 with v8 2.1.0.1 (SVN trunk, Revision 3851), Rhino 1.7R2, spidermonkey 1.8.0 pre-release 1 2007-10-03 (hg tip=changeset 38048:0dc74fd43862) and PHP 5.3.1.

Of course, as with any benchmark, this one comes with caveats.

The primary one is that lots of the work required to supply a dynamic web page in response to a request has nothing to do with the things that this benchmark measures. In particular, access to external data sources such as caching layers, key-value stores, or databases often makes up a big component of request generation time. If you’re spending 50 ms talking to some external content source, a difference in the rendering layer of 5ms vs 10ms to put the page components together isn’t going to have a drastic effect on how fast users perceive your page loads to be.

However, a counterpart to this is that while the latency difference from that 5ms may not be so great, the throughput difference could be. If those external data source connections are nonblocking and scale efficiently than your rendering layer can spend its time on other things while waiting for data replies. So that difference between 5ms and 10ms translates into a 2x throughput increase for the number of simultaneous requests that can be handled by a server. While you won’t be sending pages back to users that much faster, you’ll be using only half as much hardware to do it.

Additionally, template parsing happens anew for each iteration inside the benchmark loop. The benchmark uses the TrimPath templating library, which in turn uses eval() as part of template parsing. This puts Rhino at a comparative disadvantage because Rhino is not able to turn evaled Javascript into a Java class. Depending on your environment this may or may not be indicative of “real world” code flow. That said, removing this restriction by having the benchmarking script compile the templates only once before running the timed iterations speeds up all engines, not just Rhino.

The PHP version of the benchmark loads all of the code required for the loops and then runs the iterations. This means that any time spent parsing and compiling the PHP scripts is not included in the timing results, similar to what you’d expect if you were using APC or a comparable code cache. And while the Javascript version of the benchmark makes use of callback functions as part of normal execution flow, the PHP version does not. This reflects common programming style differences between the two languages (and, not coincidentally, benefits PHP’s performance.)

In a real-world implementation, you’re also faced with different choices with regard to threading with the different engines.

On one end of the spectrum is PHP. While PHP’s core is thread-safe, ultimate thread-safety for your entire request processing flow requires thread safety from all extensions and shared libraries you’re using. For this reason, you’re best off running PHP in a single-threaded multi-process model such as FastCGI or Apache’s Prefork MPM.

Next along the line comes V8. While it is possible to use V8 in a multithreaded program (and terminate JS execution of individual threads), only one thread can be executing Javascript at a time. The V8 C++ API offers locking primitives you can use to ensure thread safety if you are adding Javascript bindings to V8 for an existing C/C++ library but there are no Javascript APIs exposed to manipulate threads.

Spidermonkey offers a more hospitable environment for multithreading in that multiple threads can execute Javascript at the same time. Similar to V8, though, when adding extensions to Spidermonkey in C/C++ you must use the provided locking primitives to ensure threadsafety and correct garbage collection. And just like V8 there are no Javascript APIs exposed to manipulate threads.

Rhino makes it even easier. Conceptually, it behaves like Spidermonkey in terms of how separate threads can be devoted to Javascript execution. However, because Rhino also exposes Java to your Javascript code, you can take advantage of the Java classes such as Runnable and java.util.concurrent.*.

Multithreading is not definitively better than singlethreading. The multithreaded model has strengths and weaknesses just like the multiprocess model. Because the Javascript engines differ widely in their thread-capability, your choice of thread strategy, however, may dictate your choice of engine. (Or vice-versa.)

    Attend Tech Talks by Ning's Engineering & Ops teams at Ning HQ in downtown Palo Alto, CA!

    Archives by Category

    Search this Blog


    RSS