Scale with Chef

This will be a long post, but it deserves some time. Go get a glass of water, breathe deeply and let’s dive!

The why?

Even though we use “smart” techniques to get content from the feeds in real-time (or close), we still need to do some polling (remember, we’re doing something stupid, so that you don’t have to). To achieve that, we have a distributed architecture where workers can ask dispatchers for feeds to parse.

From there it is somehow easy to scale : more feeds to fetch? add more workers. We’re building a system that, at any given time knows how many workers are (and will be) needed. Once we know that, we just have to fire off some new workers (or kill unneeded ones), and we need this to happen in something like 5 minutes.

Theory

You should really check out this ~~long~~ presentation from Ezra and Adam

They say that scaling usually has 3 steps :

bootstrapping : it is the act of ‘acquiring’ resources. Basically starting servers at a given IP, with a base (naked OS), and hopefully a way to connect (ssh).
configuration : the goal here is to change a “vanilla” server into something operational.
command and control : once the server is configured and running, you still need to send him specific instructions and have feedback on how it performs.

There are several options for the configuration. The most basic one is to do everything by hand (and try to repeat for as many servers you need to deploy). Another one is to deal with an “image” (a ghost if you want), but then it’s quite hard to improve it and evolve it. At superfeedr, the option we chose is to use Chef.

Chef

Chef allows you to define a set of specifications (packages to be installed, files to de deployed, options to be configured…) of what our server should like when it’s ready. There are a few advantages to that : it’s stateless, it is idempotent, it is repeatable… etc.

There are a lot of ways to use Chef. Here is ours. It’s probably not the best for everyone (and maybe not even for us), but playing with it gave us some experience and we’re sharing it with you.

Chef is a client-server architecture, where the client is the node to be configured, which implies several things. The first one (and most impacting) is that everything happens from the client (called node). If you used to do your configurations by hand (or using scripts), you were not doing it from the client. Another consequence is that you need to have your clients to be ‘chef-ready’. This might (and will) sound weird, but you need to have some basic configuration on your servers before you can start to do the actual configuration. Luckily this configuration is rather basic and can be automated by using a script like this one.

Cookbooks

On the chef server, you would define a set of cookbooks. Cookbooks are collections of related recipes. For example, a cookbook can deal with installing the “build-essentials” packages, the apache server or git… etc. For each cookbook, you would have a set of recipes. For example, the git cookbook, may have recipes such as : install client and install server.

Technically the cookbooks can be used by different users and/or for different machines in your architecture. Actually, you want them to be as generic as they can.

Roles

Once you have your cookbooks, you should define roles. A role is a “type” of node you’d like to configure. Technically, roles are just sets of cookbooks. A role for a blog application would include recipes for Apache, MySQL, and Wordpress for example. A Rails application may have an “MySQL server” role, a “Memcached server” role, and a Web application role.

Attributes

To differentiate a cookbook for 2 nodes, Chef has attributes. An attribute would be for example the name of your server, or the port used by apache, the names of the gems you want to install, the user who runs your apache server… etc. Of course, you can define attributes at several levels : inside a cookbook, they would be the default values, or at the “role” level and even, at the node level. For example, a node IP is clearly at the node level, but the login/password to an external MySQL database can be set at the role level, while the apache cookbook should probably set the default http port to 80.

Got it? Of course, this clearly a “crash-course”, you can find a lot more information in the Opscode wiki, as well as in the #chef room at irc.freenode.net. Ezra also wrote a great blog post that you should read as well.

Where to start?

I’d start first with setting up a Chef Server. The Chef server will provide you with a few things, including a web application that can be used to see all your cookbooks, roles, and nodes at a glance.
Once that’s done, setup a chef repository, that would basically contain the code of your cookbooks. Then, add a few cookbooks, define a first role and assign it some recipes, as well as attributes. Writing cookbooks shouldn’t be too hard, but you should first try to see if someone has written one. Check opscode repository and all the associated branches. With a little luck, you won’t have to write anything but define attributes.
When you have a few cookbooks, clone that chef repo on your chef server and run rake deploy anytime you want to update the configuration.
Then, create a bootstrap script, like this one, which installs the bricks for a chef client on your node.
Finally, start a new node on your favorite cloud provider (Slicehost:“http://slicehost.com”, Linode, EC2), copy the boostrap (scp should make it) script and run it.

The great thing with this “spec” approach is that you can (and should) kill instances all the time and re-configure clean ones in minutes.

Hosting in the cloud really means that you don’t know (or at least shouldn’t care) about the nodes on which your app is running.

I’ve heard people like 37signals, EngineYard or even Twitter are using Chef to scale and deploy their architectures, that can only be a good sign ;)

Again, this was a very light intro, but, when we started to play with Chef, we found that resources (and feedback) were missing.