One challenge we face when working in the Cloud is knowing when to scale up. That means we need some obvious figures telling us we need more resources.
This requires performance monitoring. From the many tools around we chose collectd. This daemon is built on a very sleek plugin infrastructure, allowing it to run with only the functionality you actually need. If you want to know how clean C looks the collectd source code is worth a look.
Contrary to a lot of other monitoring software, the watched hosts don’t have listening network sockets but only send UDP packets to a central server. It’s not necessary to control any data gathering from the outside, thus it avoids potential remote security holes.
The primary use of collectd is monitoring of operating system resources and this is done very efficiently by its native plugins. There’s no expensive fork()
for calling other programs. If you still need that, there’s a exec plugin for that.
Once you figured out which host is responsible for gathering all the data, you can configure collectd’s network plugin to open a listening socket for receiving other hosts’ statistics.
To make use of the collected data you’ve got the rrdtool plugin which writes locally-generated or remotely-received values into the round-robin database files of the popular and proven rrdtool. Pay attention to the configuration of that plugin: once you start monitoring dozens of hosts with hundreds of plugins, rrdtool will randomly write to the many RRD files. There are some tunable parameters and the rrdcached solution.
You can generate your pretty graphs with the rrdgraph utility. This is being done by the included collection3 CGI scripts. Unfortunately, this web interface is rather Web 1.0 and allows basic browsing of your graphs only. We hope for the advent of some alternative projects to allow more convenient viewing of statistics.
Because we do not only want to watch host resources but also the quality of our processes we have developed the ruby-collectd library. Basically, you just define a collectd server on application startup and sprinkle your code with statements like Stats.counter(:job_done).count! 1
ruby-collectd does not require collectd, so you don’t need to install it locally just for watching your own application.
Comments