Our build and deployment toolchain is:
- git for version control
- Maven for packaging and dependency management
- Nexus for artifact management
- Pulse for automated build management
- galaxy for deployment
While git, galaxy, and pulse are language-neutral, Maven is Java-centric. Nexus is language-neutral but its maven-centricity gives it some Java guilt-by-association.
Most of our software is written in Java, but we use components written in C as well: both externally-written packages such as Apache HTTPD and PHP but also internal libraries and extensions.
We try to use the same build and deployment toolchain no matter what language a project is written in. This document discusses how we build C-based stuff with Maven and friends.
For a given component we need to be able to:
- track its source code
- create deployable artifacts with particular version numbers
- allow other components to rely on particular versions of the component
- deploy an artifact with a particular version number
And cores should, of course, be deployable with Galaxy and support all the interoperation that implies.
Project Structure
libmemcached is an example of an external library we depend on. It is used as a dependency by the PECL memcached PHP extension, which our PHP execution environments use to access our memcached servers. The canonical location of libmemcached source code is https://code.launchpad.net/libmemcached.
We maintain an internal git repository of the libmemached source code. We don’t do anything fancy to import the Bazaar-managed repository to preserve history, we just copy the files. This approach works fine for us for projects like this where we’re not actively developing it, just using it. (For other external projects whose canonical repository is managed with git, we git clone to create our internal repository.)
The structure of our internal libmemcached repository is as follows:
- pom.xml – Maven POM file describing the project and how to build it
- assembly.xml – Maven Assembly Descriptor describing what to put in the project’s artifact
- src/main/c – The project source code
There is no automated procedure for updating the libmemcached.git repository with a new version of the libmemcached code. When we want to build a new version of libmemcached, the new code is downloaded from the external repository, the contents of src/main/c replaced, the version number in pom.xml updated, and a new artifact created.
The project’s POM file describes to Maven how to configure and compile libmemcached. There are a number of elements in the POM file which are not present (or necessary) in typical POM files used for Java projects.
The <profiles/> section sets some variables that are architecture specific. Because we’re compiling C into object code, we need to have different artifacts created depending on operating system and architecture. The envClassifier variable, used later in the POM and in the Assembly Descriptor as well, is set to an os- and arch-specific value. The other variables set in the profiles are generally used to provide configuration or compilation flags used later in the POM. In particular the cflags variable specifies compiler settings that, on Linux, enable some stack protection and security features. The <profiles/> section of the pom looks like this:
<profiles>
<profile>
<id>linux-64</id>
<activation>
<os>
<family>unix</family>
<name>linux</name>
<arch>x86_64</arch>
</os>
</activation>
<properties>
<envClassifier>linux-${os.arch}</envClassifier>
<use64bit>--enable-64bit</use64bit>
<cflags>-fstack-protector -D_FORTIFY_SOURCE=2 -O3 -Wall</cflags>
</properties>
</profile>
<profile>
<id>linux-64-amd</id>
<activation>
<os>
<family>unix</family>
<name>linux</name>
<arch>amd64</arch>
</os>
</activation>
<properties>
<envClassifier>linux-${os.arch}</envClassifier>
<use64bit>--enable-64bit</use64bit>
<cflags>-fstack-protector -D_FORTIFY_SOURCE=2 -O3 -Wall</cflags>
</properties>
</profile>
<profile>
<id>mac</id>
<activation>
<os>
<family>mac</family>
</os>
</activation>
<properties>
<envClassifier>mac-${os.arch}</envClassifier>
<use64bit>--enable-64bit</use64bit>
<cflags>-O3 -Wall</cflags>
</properties>
</profile>
<profile>
<id>sun</id>
<activation>
<os>
<family>unix</family>
<name>sunos</name>
</os>
</activation>
<properties>
<envClassifier>sun-${os.arch}</envClassifier>
<use64bit>--enable-64bit</use64bit>
<cflags>-O3 -Wall</cflags>
</properties>
</profile>
</profiles>
This lets us build Linux, Mac, and Solaris versions easily. While we mostly use Linux on our production servers, having Mac builds is nice for developer machines.
We define some properties in the POM that are used by various plugins to determine directory names and file locations:
<properties>
<!-- Where the compilation happens. Code is copied from src/main into this directory -->
<workDir>${project.build.directory}/src</workDir>
<!-- The staged install directory. The artifact is assembled (mostly) from stuff under this directory -->
<installDir>${project.build.directory}/inst</installDir>
<!-- An arbitrary (but unlikely to occur in nature) prefix string that the compiled stuff thinks
it's installed under. This string will end up in various header files and configuration scripts
and so is easily found + replaced. The files in src/main/build are intended to be used by other
artifacts that depend on this one to do such replacements -->
<prefix>/tmp/${project.artifactId}-${project.version}</prefix>
<!-- Where dependencies get unpacked into -->
<dependencyDir>${project.build.directory}/dependencies</dependencyDir>
<!-- The directory that the memcached dependency will end up in -->
<memcachedDir>${dependencyDir}/memcached-${envClassifier}-tar.gz</memcachedDir>
</properties>
Libmemcached uses memcached in its build process, so memcached is declared as a dependency in the POM’s <dependency/> section. Combined with the use of the maven-dependency-plugin, this ensures we get an unpacked copy of memcached underneath our build directory for the libmemcached build process to use. The presence of the <classifier/> element in the dependency description ensures we get an os- and arch- appropriate version of memcached to use. The memcached dependecy declaration looks like this:
<dependencies>
<dependency>
<groupId>ning.memcached</groupId>
<artifactId>memcached</artifactId>
<version>1.2.6</version>
<type>tar.gz</type>
<classifier>${envClassifier}</classifier>
</dependency>
</dependencies>
And then dependencies are unpacked with this invocation of the maven-dependency-plugin:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<configuration>
<includeTypes>tar.gz</includeTypes>
<excludeTransitive>true</excludeTransitive>
<outputDirectory>${dependencyDir}</outputDirectory>
<useSubDirectoryPerArtifact>true</useSubDirectoryPerArtifact>
<stripVersion>true</stripVersion>
</configuration>
<executions>
<execution>
<id>unpack</id>
<phase>initialize</phase>
<goals>
<goal>unpack-dependencies</goal>
</goals>
</execution>
</executions>
</plugin>
After dependencies are unpacked, a successful libmemcached build requires the following things to happen:
- libmemcached source code put into a place where it can be built
- The familiar configure, make, make install steps
- Packaging up the results of “make install” as a deployable artifact
The first step (putting the source code into a place where it can be built) happens in the process-sources phase with the shell-maven-plugin. The shell script that this plugin runs copies the contents of src/main/c to a directory under target/, the standard Maven working directory for builds. Doing the configuration and compilation somewhere other than directly under src/main/c makes it easy to wipe out a partial build without affecting the code and keeps generated files out of the directories tracked by version control. This shell execution phase also needs to adjust permissions on some files in the unpacked memcached dependency because of a bug in the dependency plugin. (The plugin doesn’t preserve file permissions in the unpacked dependency.)
(Note that the actions taken by the shell plugin could also be done by the antrun plugin, but antrun has similar file-permission preservation problems. Running cp -a from the shell avoids that problem.)
The shell-maven-plugin section of the POM looks like this:
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>shell-maven-plugin</artifactId>
<version>1.0-beta-1</version>
<executions>
<!-- This execution cleans out the work dir and copies the source code in to it. -->
<!-- Making ${memcachedDir}/bin/* executable is necessary until http://jira.codehaus.org/browse/MDEP-109 is fixed. -->
<execution>
<id>stage-sources</id>
<phase>process-sources</phase>
<goals><goal>shell</goal></goals>
<configuration>
<workDir>${workDir}</workDir>
<chmod>true</chmod>
<keepScriptFile>false</keepScriptFile>
<script>
rm -rf ${workDir}
mkdir -p ${workDir}
cp -a ${basedir}/src/main/c/* ${workDir}
cd ${workDir}
chmod 0755 ${memcachedDir}/bin/*
</script>
</configuration>
</execution>
</executions>
</plugin>
Next, configure, make, and make install happens via standard use of the make-maven-plugin. Each phase gets some standard options set describing the directories to work in. For the configure goal, the <configureOptions> property enumerates arguments to be passed to the configure script. <configureEnvironment/> sets environment variables active during configure. Setting CFLAGS here means we don’t have to set it later when running make.
The make install phase uses the DESTDIR support for staged installs to install files not to the actual “prefix” directory, but the prefix directory under our target install directory. This keeps the build contained and doesn’t pollute the build system.
The make-maven-plugin section of the POM looks like this:
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>make-maven-plugin</artifactId>
<version>1.0-beta-1</version>
<executions>
<execution>
<id>configure</id>
<phase>process-resources</phase>
<goals><goal>configure</goal></goals>
<configuration>
<workDir>${workDir}</workDir>
<destDir>${installDir}</destDir>
<prefix>${prefix}</prefix>
<configureOptions>
<configureOption>--with-memcached=${memcachedDir}/bin/memcached</configureOption>
<configureOption>${use64bit}</configureOption>
</configureOptions>
<configureEnvironment>
<property>
<name>CFLAGS</name>
<value>${cflags}</value>
</property>
</configureEnvironment>
</configuration>
</execution>
<!-- Run "make" -->
<execution>
<id>compile</id>
<phase>compile</phase>
<goals><goal>compile</goal></goals>
<configuration>
<workDir>${workDir}</workDir>
<prefix>${prefix}</prefix>
<destDir>${installDir}</destDir>
</configuration>
</execution>
<execution>
<id>make-install</id>
<phase>prepare-package</phase>
<goals><goal>make-install</goal></goals>
<configuration>
<workDir>${workDir}</workDir>
<destDir>${installDir}</destDir>
</configuration>
</execution>
</executions>
</plugin>
After everything is compiled and installed to target/inst (because that’s the value of the ${installDir} variable set in the <properties/> section and provided to the make-install goal as the <destDir/>), the maven-assembly-plugin takes over to assemble the buildable artifact. It uses the rules in the assembly.xml file to determine what to include. For libmemcached, this is pretty simple – we want everything that was installed. Setting <id/> to ${envClassifier} in assembly.xml ensures that the appropriate os- and arch-specific classifier is included in the artifact file name.
The maven-assembly-plugin is invoked from the POM like this:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<executions>
<execution>
<id>assemble</id>
<goals>
<goal>single</goal>
</goals>
<phase>package</phase>
<configuration>
<descriptors>
<descriptor>assembly.xml</descriptor>
</descriptors>
</configuration>
</execution>
</executions>
</plugin>
And we use this simple assembly.xml:
<assembly>
<id>${envClassifier}</id>
<formats>
<format>tar.gz</format>
</formats>
<includeBaseDirectory>false</includeBaseDirectory>
<fileSets>
<!-- Include everything that "make" installed -->
<fileSet>
<directory>target/inst${prefix}</directory>
<outputDirectory>/</outputDirectory>
</fileSet>
</fileSets>
This instructs maven to build an artifact with a name like libmemcached-0.45-mac-x86_64.tar.gz or libmemcached-0.45-linux-amd64.tar.gz, and that’s what gets uploaded to Nexus.
Path Adjustments
The DESTDIR mechanism makes it easy to stage an install and package up the results. But sometimes the build process generates files which embed build-time pathnames that almost certainly will be different when the artifact is unpacked and used as a dependency. Both Apache HTTPD and PHP have this issue. Their artifacts (and the things that depend on them) use a convention for path-adjustment scripts to solve this problem.
For example, our local Apache HTTP repository contains a src/main/build directory with two files in it: adjust.sh and adjust-files.txt. adjust.sh is a script that should be run by a component which depends on httpd:
#!/bin/bash
if (( $# < 3 )); then
echo "Usage: $0 BUILD_PREFIX INSTALLED_DIR FILE1 [FILE2 ... FILEn]"
exit 1
fi
BUILD_PREFIX=$1
INSTALLED_DIR=$2
INDEX=1
for f in $@; do
if (( $INDEX > 2 )); then
sed "s:${BUILD_PREFIX}:${INSTALLED_DIR}:g" < $f > $f.tmp && mv $f.tmp $f
fi
((INDEX++))
done
And adjust-files.txt is a list of files, used by adjust.sh to determine which files to adjust:
bin/apxs
build/config_vars.mk
bin/envvars
These get used, for example, in our build of PHP, which relies on Apache HTTP. The pom.xml in our internal repository for PHP contains some variable definitions that describe its dependency on httpd:
<properties>
<!-- Where dependencies get unpacked into -->
<dependencyDir>${project.build.directory}/dependencies</dependencyDir>
<!-- The directory that the httpd dependency will end up in -->
<httpdDir>${dependencyDir}/httpd-${envClassifier}-tar.gz</httpdDir>
<httpdVersion>2.2.14</httpdVersion>
</properties>
And the dependency itself:
<dependency>
<groupId>org.apache</groupId>
<artifactId>httpd</artifactId>
<version>${httpdVersion}</version>
<type>tar.gz</type>
<classifier>${envClassifier}</classifier>
</dependency>
Inside the configuration of the shell-maven-plugin, however, there’s an additional execution to run in the process-sources phase (additional over the source-code-copying script that is similar to what appears in the libmemcached POM.)
This script execution invokes httpd’s adjust.sh script in order to replace pathnames in the listed files with the appropriate pathname for wherever the dependency has been unpacked to:
<!-- This execution runs the "adjust.sh" script bundled with the httpd dependency
to adjust pathnames in its configuration files to point to the place it's been unpacked.
It also needs to make bin/apxs (in the httpd dependency) executable, since the dependency
plugin has a bug which doesn't preserve file attributes on unpacked files. -->
<execution>
<id>adjust-dependencies-httpd</id>
<phase>process-sources</phase>
<goals><goal>shell</goal></goals>
<configuration>
<!-- CWD for the script should be where httpd was unpacked -->
<workDir>${httpdDir}</workDir>
<chmod>true</chmod>
<keepScriptFile>false</keepScriptFile>
<!-- making bin/apxs executable is necessary until http://jira.codehaus.org/browse/MDEP-109 is fixed -->
<script>
chmod 0755 ${httpdDir}/build-dependency/adjust.sh
${httpdDir}/build-dependency/adjust.sh /tmp/httpd-${httpdVersion} ${httpdDir} `cat ${httpdDir}/build-dependency/adjust-files.txt`
chmod 0755 ${httpdDir}/bin/apxs
</script>
</configuration>
</execution>
The first line of the script makes the adjustment script executable (there’s that preserving-permissions-on-extracted-dependency-files bug again) and the second runs adjust.sh. The command line arguments tell it to replace /tmp/httpd-${httpdVersion} (what it thinks was the installed prefix, since that’s what was specified in httpd’s POM) with ${httpdDir} (the actual directory the httpd dependency has been unpacked into). The remaining command line arguments, slurped in from adjust-files.txt, are the files to do the replacement in.
The structure of this adjustment setup means that packages which need adjustment and packages that depend on them can operate in a standardized manner.
Packages which need adjustment just have to do three things:
Packages which depend on an adjustment-needing package just have to run the adjust.sh script as described above.
Patches
External packages may not be updated with changes or fixes as fast as we want them to. So it’s handy to be able to patch the external source before we compile it.
The translit PHP extension is a good example of this. The last release, 0.6.0, was in April 2008. We’ve reported two bugs since then that haven’t been fixed. The POM for translit in our internal repository uses maven-patch-plugin to apply two patches that fix these bugs:
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-patch-plugin</artifactId>
<executions>
<execution>
<id>patch-sources</id>
<phase>process-sources</phase>
<goals><goal>apply</goal></goals>
<configuration>
<targetDirectory>${workDir}</targetDirectory>
<patches>
<!-- Fix http://pecl.php.net/bugs/bug.php?id=13687 -->
<patch>bug-13687-ucs-2-le.patch</patch>
<!-- Fix http://pecl.php.net/bugs/bug.php?id=15326 -->
<patch>bug-15326-empty-strings.patch</patch>
</patches>
</configuration>
</execution>
</executions>
</plugin>
The patch files are in the src/main/patches directory of the repo, the standard place that maven-patch-plugin looks.
Pulse
Our Pulse projects for these cores are be configured slightly differently than standard pulse projects for Java-based cores. The specific changes required are:
- not capturing a “test reports” artifact from the build
- only capturing “.tar.gz” artifacts from the build (No .jars)
- adding additional build stages that specify appropriate required resources so that builds execute on different agents representing all the OSes we want to build on (generally just Linux and Mac)
Otherwise Pulse behaves the same way for these projects as it does for others — we get the same automatic builds, logging and reporting.
Conclusion
Using maven in this way lets us re-use the parts of the infrastructure that makes sense for non-Java projects: version control integration, dependency management, automated builds, artifact management. Appropriate maven plugins (and shell-scripty glue when necessary) keep the C-specific stuff familiar, so the normal autotools build lifecycle can be preserved.