Introducing Galaxy

Posted by Brian McCallister on February 21, 2011 – 10:43 AM

Galaxy is Ning’s deployment management system. We’ve been using it for a bit over four years now, and we rather like it. We use it to deploy, and keep track of, all of the various services engineering produces, as compared to operational packages which form our base system (and are better managed by things such as chef or puppet).

Galaxy’s primary components are the agents, the console, and the repository. Additionally, it specifies a contract for our deployment packaging, and provides a ruby library and command line client for interacting with the console and agents.

The Galaxy Agent

The galaxy agent is a small daemon which runs on each host in an environment. We virtualize hosts heavily, so we run one agent to a virtual host, but there is no inherent requirement in galaxy to work this way. The agent’s primary responsibility to deploy, start, stop, and report on a single service. When instructed to deploy a service, the agent will download the binary package (a tarball) from the repository, expand the package into a deployment root, and invoke the deploy script inside the expanded package, passing in deployment time information such as a configuration path, the IP address the service should activate on, and so on.

When the agent is instructed to start, stop, or check on the status of a service, it invokes a control script (actually, traditional rc script, receiving start, stop, or status) from the expanded package.

Finally, the agent sends heartbeat messages specifying its present state to the galaxy console on a regular basis.

The Galaxy Console

The console is a convenience service for examining the state of the system. It keeps an in-memory represention of all the hosts in the system, what is deployed to them, what configuration paths were given to them, and so on. It also provides convenient APIs over DRb for querying that in memory state.

The galaxy agent is configured with a console to report into when it is started, and its heartbeats back to the console keep the console up to date. The console keeps track of the last known state of every agent reporting into it, and serves as a failure detector for the agents as well.

The Galaxy Command Line Client

The command line client, galaxy, communicates with the console and agents in order to report on, and affect change to, the system. Deployments, configuration updates, clearing, and so on all generally occur through the command line client. It is designed to operate on logical sets of hosts, and uses selectors on the command line, ranging from logical groups (all agents with something deployed to them, all agents running a given type of server, etc) to regular expressions on the configuration paths for services on the agents.

In general, the client will communicate with the console for lookup operations, and with the agent directly for making changes to the system. It can query agents directly, if needed, however.

The Repository

There are two sides to the repository. The first is the binary repo, which hosts all of the binary packages which can be deployed to the system, and the second is the configuration repository which hosts configuration for hosts. The repository is, from galaxy’s perspective, just a static web server. Binary artifacts are found via a naming convention, {type}-{version}.tar.gz and configuration via a directory hierarchy.

When a service is being deployed, it is given a configuration path, of the form xne2/rslv/front. The meaning at each level is arbitrary, and can be designed to suit whatever you need. In our case, we usually use {environment}/{service-type}/{pool}. At deployment time, this path is used by the galaxy agent to look up a file which specifies the artifact type, and version. The agent starts at the bottom of the path and walks up the directory hierarchy looking for a file at each level. Values further down the hierarchy override values higher up.

For example, given a repository root of http://repo/production/ and the configuration path dc3/front-door/api, the agent will make requests for each of

  1. http://repo/production/dc3/front-door/api/
  2. http://repo/production/dc3/front-door/
  3. http://repo/production/dc3/
  4. http://repo/production/

Any 404’s will be ignored, and values at http://repo/production/dc3/front-door/api/ will take higher priorty than values at http://repo/production/dc3/

Our deploy scripts inside the service package make use of this same mechanism for application configuration as well, but they use a different file name, in some cases multiple file names.

Service Packaging

Galaxy defines a contract with our deployment artifacts that they be named a certain way, {type}-{version}.tar.gz and that within the package there exist two scripts:

This is the deploy script which is invoked by the galaxy agent at deploy time. It will be passed, as command line arguments, such things as the configuration path for the service, the deployment root on the host, the IP address the service should bind to, and so on.
This is a control script which follows RC script conventions. It will be used to start, stop, and check on the status of the service.

Aside from that, the package is free to do whatever it wants. By convention, we bundle everything the service needs, be it a container server (Jetty, Apache), libraries where we need specific versions or which we don’t want to place under operational automation for some reason, business configuration files, and so on.

Open Source

Finally, we are very excited to announce that we recently released Galaxy as open source on github under the Apache License, 2.0.

Brian McCallister Posted by Brian McCallister, written on February 21, 2011 – 10:43 AM.
Brian McCallister is a programmery kind of guy.

Also from Ning Code…

  1. 5 Responses to “Introducing Galaxy”

  2. Smells a bit like Microsoft’s Autopilot, now Red-Dog?

    Regardless, I like the separation of base system packages versus services etc and the service packaging. My own deployment experience aligns well with those choices.

    By Dan Creswell on Feb 22, 2011

  3. Is much simpler than autopilot, if you take the autopilot paper as gospel. That said, yes 🙂

    By Brian McCallister on Feb 22, 2011

  4. Does this duplicate any of the functionality in Apache ZooKeeper ?

    By Gavin Bong on Feb 26, 2011

  5. Gavin: not really. Zookeeper is aimed at a very different problem, and generally used for locking and service discovery type things (though very much not limited to those!). Galaxy *could* be used for service discovery, but I wouldn’t recommend it.

    By Brian McCallister on Mar 9, 2011

  1. 1 Trackback(s)

  2. Feb 22, 2011: links for 2011-02-22 « Dan Creswell’s Linkblog

Post a Comment

Comment moderation is enabled. Your comment will not appear until it has been approved. We look forward to hearing from you and encourage your comments, critiques, questions, and suggestions. We ask that you stay on topic, respect other people's opinions, and avoid profanity and offensive statements. For questions specific to your Ning Network, please drop us a note in the Ning Help Center.

Attend Tech Talks by Ning's Engineering & Ops teams at Ning HQ in downtown Palo Alto, CA!

Archives by Category

Search this Blog