Originally published at ScummBlog. You can comment here or there.

Recently a couple of Linode VMs were acquired by my employer. It was decided that these would host OpenVPN, OpenLDAP, PostgreSQL and a couple more services in a HA cluster. This seemed like a great idea, and much congratulating occurred.

Little did we know what awaited us.

First off, a serious protip: Ubuntu 10.10 (Maverick Meerkat) isn’t an LTS (Long Term Support) release. Unbeknownst to us this meant that the Ubuntu HA cluster launchpad didn’t support 10.10. Not knowing this, we jovially upgraded the 10.4 Linodes to Maverick Meerkat.

LDAP and OpenVPN were installed on what was decided to be the primary node in the cluster. We got the services configured and got the other servers elsewhere pointing at it. In addition to the VPN change, it was decided that the internal IP addresses would be renumbered off the 10.0.0.0 segment. This resulted in a rather catastrophic failure in DNS, but eventually everything got straightened out.

After the celebration ended, the cluster experimentation began.

Savvy readers might be thinking: “Hey! You’re doing this in the wrong order! You should be working on clustering and THEN getting services up and running!” and to those people I say: Shut up. Where were you two weeks ago? In actual fact the clustering idea came after we got LDAP and OpenVPN configured, so there really wasn’t a chance to do it that way.

So the second node was spun up, upgraded to 10.10 (groan) and the grand experiment began. The first hurdle was getting PV-Grub working on the Linodes. I sadly can’t remember how we actually got that working, otherwise I’d share that necessary tidbit with you.

The clustering options we selected were DRBD for disk sharing and Heartbeat and Pacemaker for the actual HA crap. This worked fairly well right up to the point where we discovered that DRBD with ext4 absolutely will not run in a master/master configuration. This is bad. An explanation: Pacemaker was only happy when one node was Online and one node was Standby. This made it so only one node could access the DRBD mount. The way DRBD works is to sync data between discs. With only one node connected this sync wasn’t occurring, making the failover action break in hideous ways.

“Back to the drawing board!” we said. “How about DRBD with some sort of cluster-aware filesystem on it? Brilliant!” and off we ran to implement this new epiphany. LVM was the first try, and that didn’t work out well. LVM would require OCFS2 or GFS to be used, and if we were going to use either of those, what’s the point of having LVM in the middle? So LVM was torn out and OCFS2 was arbitrarily selected as the FS of champions.

Having a multi-node mountable (in laboratory tests!) file system, we clicked our heals and ran off to plug that into Heartbeat and Pacemaker. Tragedy struck once again. We couldn’t get Pacemaker to start DRBD, OCFS2 and mount the resulting filesytem. No way, no how. Turns out a very important library for Pacemaker in relation to OCFS2 does not exist in 10.10. Some thought was put to starting DRBD and OCFS2 during boot, but that caused more headaches, as they absolutely had to start up after the network service, but before the dependent services.

This brings us to now. Having created VirtualBox 10.4 installs on my workstation, I have found that the following Ubuntu Wiki article will allow you to get a working HA cluster going. Mostly. It does leave out some required commands and packages. That wiki page is https://wiki.ubuntu.com/ClusterStack/LucidTesting

I’d be a real dick to tell you that there is information missing and not tell you what it is, so here you go: under the Pacemaker, DRBD and OCFS2/GFS section they neglect to mention that you need install openais manually, as it isn’t pulled in as a requirement with Pacemaker. Another oversight is that you need to run “drbdadm create-md (diskname)” (without the quotes or brackets) in order to actually initialize the DRBD disk prior to configuring Pacemaker.

So if you hold off upgrading your Ubuntu installs to 10.10 and follow that wiki page you *should* end up with an HA cluster that works.


From: [identity profile] northbard.livejournal.com


That seems a bit unfair to LVM. Using an existing layer of functionality isn't that outre really. *g* Plus :: snapshots!!

Fuck rsync backups. Fuck 'em in the bad place.

From: [identity profile] scumm-boy.livejournal.com


I won't deny that LVM is quite useful, but it wasn't something we needed to mess with. Straight DRBD and OCFS2 work pefectly for our purposes. Backups of data will go to Amazon's S3 via Amanda Backup, more than likely.
.

Most Popular Tags

Powered by Dreamwidth Studios

Style Credit

Expand Cut Tags

No cut tags