For the last ~two or so years I’ve played and tested gluster on and off and hanging out in the awesome #gluster channel on Freenode. In case you haven’t heard, gluster was acquired by RedHat back in October 2011. This post describes my current setup. I urge you to send your comments and suggestions for improvement. I’ll update this as needed.
Hardware:
Ideology: I wanted to build individual self-contained storage hosts. I didn’t want to have servers with separate (serial) attached storage (SAS) like Dell is often pushing. Supermicro fit the design goal, and sold me when I realized I could have the OS drives swappable out the back.
- 4 x Supermicro 6047R-E1R24N
- 4 x 24 x 2TiB, 3.5″ HDD (front, hot swappable main storage)
- 4 x 2 x 600GiB, 2’5″ HDD (rear, hot swappable os drives, awesome feature!)
- 2 x quality stacked switches (with one leg of each bond device out to each switch)
- IPMI: absolutely required (It seems it’s a bit buggy! I’ve had problems where the SOL console stops responding when dealing with a big stream of data, and I can only rescue it with a cold reset of the BMC.) Overall it’s been sufficient to get me up and running.
OS:
- CentOS 6.3+. I would consider using RHEL if their sales department could get organized and when RHEL integrates into my cobbler+puppet build system.
- Bonded (eth0,eth1 -> bond0) ethernet for each machine. Possible upgrade to bonded 10GbE if ever needed. Interface eth0 on each machine plugs into switch0 and eth1 on each machine plugs into switch1.
- The 24 storage HDD’s are split into two separate RAID 6’s per machine.
- OS HDD’s in software raid 1. Unfortunately anaconda/kickstart doesn’t support RAID 1 for the EFI boot partitions. Maybe someone could fix this! (HINT, HINT)
- The machines pxeboot, kickstart and configure themselves automatically with cobbler+puppet.
- The LSI MSM tool (for monitoring the RAID) seems to give me a lot of trouble with false positive warnings about temperature thresholds. Apart from being stuck with proprietary crap ware, it does actually email me when drives fail. Alternatives welcome! I deploy this with a puppet module that I wrote. If it weren’t for that, this step would drive me insane.
Gluster:
- Each host has its drives split into two bricks. A gluster engineer recommended this for the type of setup I’m running.
- Each RAID6 set is formatted with xfs.
- Keepalived maintains a VIP (will replace with cman/corosync one day) which serves as the client hostname to connect to. This makes my setup a bit more highly available if one or more nodes go down.
- I have a puppet module which I use to describe/build my gluster setup. It’s not perfect, but it works for me ™. I’m cleaning it up, and will post it shortly.
- I’m using a distributed-replicate setup, with eight bricks (2 per node).
- I originally used the official packages to get my gluster rpm’s, but recently I switched to using: kkeithle‘s. Thanks for your hard work!
Conclusion:
Let me know what other nitpick details you want to know about and I’ll post them. A lot of things can also be inferred by reading my puppet module.
Happy Hacking,
James