(Article originally posted at InfoWorld Magazine)
When I build a network or a system, I try my best to make sure that everything is as redundant as possible: redundant power supplies, RAID for the drives in case of a hard drive failure, backup routes in OSPF in case someone trips over the network cable… you get the idea. But what happens if the CPU overheats in one of the web servers and causes it to crash? Or what if someone yanks the network cable from your LDAP server? Or if someone flips the switch and accidentally turns off the accounting database server? If you incorporate High Availability (HA) into your system design, the answer is “nothing”. Your web site will still be running, your network users can still login, and the accounting department won’t notice any glitch. You don’t even have to leave your desk.
Let’s say you have mostly static web content for your web server and you can fit everything on a 2GB compact flash card. Then you can build two solid state machines using the Debian Router Project. Using simple solid state hardware means less moving parts and less likelihood of a hardware failure. Then you can use heartbeat to create your HA web server cluster. If you have content which changes more frequently, like the leases file for a DHCP server, a database, or a file server, then you should look into using DRBD to synchronize the two file systems.
Heartbeat requires you to setup a private link for the two machines (nodes) to communicate, so they know the other node is still alive. While you can just use a crossover cable to connect the two nodes I would strongly recommend that you install two network cards in each node and setup a private VLAN or network just for the heartbeat communication. This will give you a little more flexibility later. You will need 5 IP addresses total, two for the private heartbeat link, two public ips – one for each of the nodes (if you wish to manage them remotely), and one more public ip for a “virtual” IP address that is held up by the heartbeat software. This virtual ip address is the IP address your users visit. (By the way, heartbeat supports IPv6)
Once you have heartbeat configured on both nodes and have designated one of nodes to be the master, the two will start “pinging” each other over the private link. Now to see it in action: Start a ping to the “virtual” IP address, and unplug the network cable for the master node or just shut it down to simulate a disaster. You should lose a few pings, but in just a few seconds, the backup node will realize that the master is no longer responding, and will take over the virtual ip address and reply to your pings. This means, if one of the nodes failed in production environment, users will only experience seconds of outage, instead of minutes, or dare I say, hours.
Now, if you have followed my advice about putting the heartbeat link on its own VLAN instead of just hooking it up with a crossover cable, you have the flexibility to move the backup server to a different location in the building (or however far you VLAN will reach). Why? This protects you from a bigger scale of disaster, say, a power outage for the entire room, fire, or flood (hey, I’ve seen it happen). If you have the two heartbeat hosts separated physically, you stand a better chance of surviving the disaster. Plugging both machines into the same network switch creates another single point of failure, so it is highly recommended that your backup machine be connected to a different network switch, and preferably a different power grid.
keepalived uses VRRP (Virtual Router Redundancy Protocol), a widely supported protocol amongst routers. This means it can be integrated nicely into your existing network infrastructure. keepalived was originally designed to work for multiple routers, and it works pretty much the same way heartbeat does, except keepalived does not need a dedicated private link, and it is easier to setup more than two nodes. (It is unclear whether or not keepalived currently supports IPv6)
So far you’ve achieved automatic fail-over. But don’t you feel that all these back up nodes sitting around is a bit of a waste? Can you leverage all those idle computing power? You mean you want load balancing on top of your HA functionality? Open Source answers with CARP (Common Address Redundancy Protocol). The OpenBSD team released CARP in 2003 as a replacement and enhancement to VRRP, it features:
You can also combine CARP with pfsync (OpenBSD’s packet filter), and now you can build a cluster of firewalls/routers that are always online, load balances amongst each node, and in case of a failure, users do not lose any sessions or states.
In conclusion, heartbeat (along with DRBD) is the easiest to setup for a 2-node cluster, Keepalivedintegrates well into your VRRP environment, and CARP brings security and load balancing to the table. In case you are wondering how mature this technology is, heartbeat has been around for years, and has a list of success stories.