Just enough bootstrap for your configuration management agent with cloud-init

Infrastructure automation tools like Rudder, SaltStack, Chef, Puppet or CFEngine can manage your system from top to bottom – once they are installed on it. The typical chicken and egg problem occurs when deploying such a tool – how do you bootstrap it? Each managed node needs to have an agent installed and configured on it. Using cloud-init is one simple, multi-OS and “de-facto standard” approach to do this when a new system boots the first time.

What we are trying to achieve

For a configuration management agent to do it’s job, it usually needs very little pre-configured on a given managed system (usually called nodes), but what it does need it really can’t do without.

Typical pre-requisites are:

  • Install the agent itself (usually a package such as rudder-agent, cfengine-community, puppet-agent, chef-client, salt-minion, …).
  • Tell the agent where to contact its server. This would usually be a hostname such as rudder.my-company.com, but could be an IP if you can’t or don’t want to rely on DNS.
  • Configure DNS so that the above name can be resolved (unless you’re using an IP, of course).
  • Provide a certificate or API key to enable the new node to register with its server.
  • Configure NTP to ensure that all nodes have a synchronized clock (a clock skew of over a few minutes usually prevents registration and/or policy updates).
  • Set up the new node’s hostname.
  • Run the agent once or set it to run regularly.

Our goal in this article is therefore to automate these steps as much as possible, so that all the new servers you add to your infrastructure automatically get a running configuration agent in them.

Why not just build images with all that already in them?

Since the generalization of virtualization, and even more so with the widespread adoption of IaaS cloud services, it has become common to build a server image including all the shared configuration you want in it – your configuration management agent, basic security settings, common users, everyday system tools, etc.

There are several problems with this approach:

  1. You need a new image every time you change any configuration and this leads to image sprawl.
  2. Since the configurations included in the image are generally not also under configuration management, it makes it very hard to change them after a server is installed, which negates many of the benefits of using a centralized configuration tool.
  3. The further away from OS official images, the harder it may be for you to port your infrastructure to a different OS, a different cloud or visualization provider, or just to update to the latest major or security release of that OS.

I could write pages and pages about why using images for configuration is a bad idea, but that is not our focus here. Suffice to say that our goal here is to explain how to use standards (images, APIs, tools, …) as far as possible before running your favorite infrastructure automation tool to reach a fully-configured state – without spreading configurations across too many tools.

Enter cloud-init

cloud-init is, according to their documentation, “the defacto multi-distribution package that handles early initialization of a cloud instance“. Simply put, it is a Python tool that comes pre-installed in many cloud images, and can read in a script from your cloud provider’s metadata to perform certain tasks on the first boot of a new server.

cloud-init is written by Canonical, and was first made available on Ubuntu, but is now widely available and supported on a variety of operating systems including Debian, Red Hat, CentOS, Fedora, ArchLinux and even WindowsCoreOS and FreeBSD!

Several operating systems provide official cloud images that already bundle cloud-init. This is the easiest, and most standard, way to use it – but if your preferred distribution isn’t listed below, you can always build a minimal image that just adds cloud-init on top of a standard base image. As of writing, the following operating systems are known to ship images with cloud-init pre-installed (there may be some that I’ve missed – please let me know if so!):

Most IaaS cloud providers enable use of cloud-init (since they have a simple implementation of the “user-data” field in metadata), including Amazon EC2, OpenStack (via Heat), Exoscale, Rackspace, CloudSigmaDigitalOcean, …

cloud-init can manage a bunch of actions on each instance, including users, groups, packages, SSH keys, CA certificates, DNS, filesystem mounts, timezone, locale, reboot, shutdown, … There are even specific modules available to install and configure Puppet, Chef and SaltStack.

A sample cloud-config script looks like this:

# Set up DNS
resolv_conf:
  nameservers: ['8.8.4.4', '8.8.8.8']

# Use our country-local package mirror
apt_mirror: http://us.archive.ubuntu.com/ubuntu/

# Install some useful package
packages:
  - vim
  - htop

# Set up our SSH keys so we can log in remotely
ssh_authorized_keys:
  - ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAGEA3FSyQwBI6Z+nCSjUUk8EEAnnkhXlukKoUPND/RRClWz2s5TCzIkd3Ou5+Cyz71X0XmazM3l5WgeErvtIwQMyT1KjNoMhoJMrJnWqQPOt5Q8zWd9qG7PBl9+eiH5qV7NZ mykey@host

# Anything that doesn't have a dedicated cloud-init module can be run via plain commands
runcmd:
  - mysql -e "CREATE DATABASE wordpress"

As you can see, this is very powerful, and you could use cloud-init to set up an instance from scratch to a pretty advanced state. But of course, you definitely don’t want to that – because the whole point here is to use one single source of truth, our centralized configuration management system, right? So let’s see how we can leverage just enough cloud-init to get our agent up and running.

Workflow from instance creation to fully-configured server

Let’s look at the big picture. Simply put, here are the steps that need to happen to achieve our goal described above using a cloud provider and cloud-init:

  1. Write a cloud-config script to set up a node, install your configuration agent and configure it.
  2. Pass that script to your cloud provider’s API when you request a new instance to start, via the user-data field.
  3. When the new instance boots, one of the first services launched will be cloud-init. It will read the user-data field from the cloud provider’s metadata, and notice a cloud-config script.
  4. cloud-init will run the script you wrote earlier, as early as possible after an instance’s first boot, saving you time (and therefore money!) and set up our minimal requirements and configuration agent.
  5. At the end of the script, your configuration agent will spin up and register with it’s server, apply it’s initial configuration policies.
  6. Finally, your new node will be fully configured – or at least as fully as the configuration managed by your automation tool.

An example to install rudder-agent

Rudder doesn’t (yet) have a dedicated module in cloud-init, unlike Puppet, Chef and SaltStack. Not to worry though, installing a rudder-agent is very easy and can be done with a few standard cloud-init modules.

Assuming you’re running a RPM-based Linux distribution like Red Hat or CentOS, just put the following cloud-config script in a file, named something like install-rudder-agent-rpm-latest.yml (for apt-based Linux distributions like Debian or Ubuntu checkout this equivalent script):

Now all you have to do is spin up an instance using your favourite cloud provider, and passing in the cloud-config script we just created:

For Amazon EC2:

ec2-run-instances --key your-ssh-key --user-data-file install-rudder-agent-rpm-latest.yml ami-12345678

For OpenStack:

nova boot --flavor <flavor> --image <image id> --user-data install-rudder-agent-rpm-latest.yml yournewhost

Some seconds or minutes later, you will have a new instance running that has already done all the steps in your cloud-config script. For the Rudder example, it should appear immediately in your “Pending nodes” screen, and similarly for other systems.

One more step away from duplicating configurations

At this stage, hopefully you’re thinking something like “wow, that was easy!”. Or maybe you’re thinking “right, but how do I manage that pesky .yml file I need to have lying around?”. Indeed, we’re not doing a great job of centralizing configuration if we just have a text file sitting somewhere on a server with the key configuration steps you need to deploy a new server.

There are several options to avoid this. Obviously, you could check this cloud-config script into version control along with the other files your configuration management tool deploys (under /var/rudder/configuration-repository/shared-files directory for Rudder) and deploy that script where needed. Alternatively, we can leverage cloud-init‘s #include mechanism.

cloud-init has many different recognized input mechanisms, one of which is #include. So you could simply host the cloud-config file somewhere in your infrastructure and use it. Or, you could directly source the Rudder example above that is hosted on GitHub. All you need then is to pass a file containing something like this when you create a new instance:

#include
https://gist.github.com/jooooooon/630caa2bfbf38d5a2bb1/raw/

And hey presto, your new instance will source that file (or a better and updated version of it in case something like the repo URL or public key changes in the future), and you’re done.

What’s next?

The above example is specific to installing the Rudder agent, but bear in mind you can easily adapt it to install whatever agent you want.

We plan to implement and contribute a Rudder module for cloud-init that will directly allow you to specify what you want to configure with a simpler and shorter syntax. I haven’t fleshed this out yet, but I imagine the above example could be replaced with something like this:

rudder:
  agent:
    version: latest
    server: rudder.my-company.com
    run_immediately: true

I also think it would be nice if rudder-project.org hosted these cloud-config scripts directly, allowing to provide any users a simple, fixed URL to get the best practice method for installing any version of Rudder. To get the server right, this would mean relying on your DNS setup to be able to resolve the “rudder” hostname in the default domain. That way you could just #include http://www.rudder-project.org/latest/cloud-config-install-rpm.yml or something similar.

When either of these is the case, you can expect an update here on this blog.