a puppet module for gluster

I am an avid cobbler+puppet user. This allows me to rely on my cobbler server and puppet manifests to describe how servers/workstations are setup. I only backup my configs and data, and I regenerate failed machines PRN.

I’ll be publishing my larger cobbler+puppet infrastructure in the future once it’s been cleaned up a bit, but for now I figured I’d post my work-in-progress “puppet-gluster” module, since it seems there’s a real interest.

Warning: there are some serious issues with this module! I’ve used this as an easy way to build out many servers with cobbler+puppet automatically. It’s not necessarily the best long-term solution, and it certainly can’t handle certain scenarios yet, but it is a stepping stone if someone would like to think about writing such a module this way.

For lack of better hosting, it’s now available here: https://dl.dropbox.com/u/48553683/puppet-gluster.tar.bz2 Once I finish cleaning up a bit of cruft, I’ll post my git tree somewhere sane. All of this code is AGPL3+ so share and enjoy!

What’s next? My goal is to find the interested parties and provoke a bit of discussion as to whether this is useful and where to go next. It makes sense to me, that the gluster experts chirp in and add gluster specific optimization’s into this module, so that it’s used as a sort of de-facto documentation on how to set up gluster properly.

I believe that Dan Bode and other gluster developers are already interested in the “puppet module” project, and that work is underway. I spoke to him briefly about collaborating. He is most likely a more experienced puppet user than I, and so I look forward to the community getting a useful puppet-gluster module from somewhere. Maybe even native gluster types?

Happy hacking,
James

 

Advertisements

5 thoughts on “a puppet module for gluster

    • Absolutely! As stated in my post, this will be coming in a day or so.
      I was hoping to get rid of a bit of initial cruft before it became permanent in git.

  1. Interesting module, but beware the glistening of what glusterfs offers.

    I have long wanted to be a fan of glusterfs, first using it in 2007 and abandoning due to stability issues. At the end of last year I made that tentative step again.

    After 7 months of testing and breaking glusterfs with Distributed Replicated Volumes across AWS, Rackspace and physical data centre servers, it became clear that while glusterfs is good and it works, when it does not work, it is not really operationally feasible to run in a production environment (IMHO).

    When it breaks, it breaks badly and is a pig to debug. After investing so much time and effort in developing and building with glusterfs, it was a difficult decision to abandon it, but a necessary one. I was hoping that RIPienaar’s experiences from 2010 (http://www.devco.net/archives/2010/09/22/experience_with_glusterfs.php) may have been laid to rest in glusterfs 3.2 and 3.3. However thorough testing, with real world scenarios, the same type of operational issues still affect glusterfs, which is a real shame.

    Anyone considering using glusterfs should ensure that they are VERY familiar with its recovery and scaling operations (adding new bricks, volumes, servers, etc) and ensure you get them dialled and tested thoroughly and even then, one should expect anomalies from time to time.

    Deploying it is one thing, working with it in production long-term is another.

    • Sorry to hear you’re having problems. Post some specifics on gluster-users and maybe someone can help you out :)

      What I can tell you is that if there is a puppet module that makes it easy to deploy gluster, it will be far easier to “quickly setup and prototype” various setups, and developers and sysadmins will be able to get rid of any bugs you experience sooner.

  2. Absolutely true, puppet does make it possible to deploy glusterfs easily, well… And the more people that use glusterfs, the better it will become.

    First off, my implementation of glusterfs was outside the recommended operating requirements, granted. I was testing the feasiability of using glusterfs as a HA, distributed filesystem running on public IP, in multiple geographic regions, including multiple clouds and self owned physical infrastructure.

    There was a hint in the glusterfs docs that it would work. Running pure tcp and not rdma.

    However DR is the key issue here and there lies the rub.

    glusterfs is about HA and recoverability.

    It tends to be quite hard to define some complex bugs. It can take days and months of testing to run into them.

    But at the moment I need something that works and recovers. I have found glusterfs running in HA (distributed replicas), distributed geographically with public IP, not to be production stable, from an operational point of view.

    puppet is fantastic for knowns. However, when you introduce “variables” such as bandwitdth limitations to the glusterfs cluster or bandwidth full/then throttled scenarios and start testing DR and self-heal, things get a bit interesting and glusterfs starts to add UUID servers for an existing server/s and that are already “peer connected” with their resolvable hostnames. It is hard to debug why things sometimes occur in that environment and it is harder to puppetize DR and edge cases. I found that it is not that hard to get glusterfs to get its knickers in a twist on the clouds.

    I will say that even running glusterfs server nodes at 256 MB RAM and 32 bit, it works, remarkably well!

    However, increasing the RAM and removing bandwidth limitations, problems still occurred during some DR testing, with iteration after iteration and test after test with self-heal and performance benchmarking with fio. Tests DID work well on public IP, until a problem occurred. And those few problems when they did occur, were beyond debugging and attempts to rectify the situation where in vain.

    Whether it be glusterfs or user errors is irrevelant in the context of recovery. DR and self-heal did work on numerous occasions (read most), so I am fairly confident that the glusterfs DR processes were followed and puppetized correctly.

    My unresovlable problems were always related to the introduction UUID problems or self-heal “not working” problems, e.g. exit with 0 after a number of seconds but do nothing and glusterfs logs are not helpful in that situation.

    I went with it for 7 months, as after the first full rebuild (erasing the storage directories) and starting from scratch, I thought user error. But after running into a number of problems, well.

    “* To rebuild gluster (erasing all data), rm -rf the storage dirs”, is not necessarily an option in production, especially when you cannot add a new node to the cluster and have to rebuild the entire cluster to get it all working again.

    One could chalk this up to “cloud” or public transport limitations, but seeing as recovery in the cloud is a must, as is debug data, it is not suit for purpose in that environment (IMHO), unfortunately. 7 months of unfortunately. However, NFS with lsyncd/csync2 seem to be a good match for distributing files geographically across public IP. Also quite a complex puppet recipe, like glusterfs, but with a little less “under the hood” from an operational standpoint. That redhat dropped gnutls ssl libraries is a pain, but openvpn can be used instead of SSL to encrypt data between nodes, more hoops and complexity, but it has faired well thus far, but still running with more extensive test.

    I could have tried submitting to the glusterfs-users but I need something now and in past experience user groups for things such as this are a long road and I honestly could not debug and pinpoint the few problems when they occurred.

    hadoop? Just dislike the java overhead and somewhat overkill as a NFS, distributed replacement. Add in fuse, etc… I wish drbd would fit the bill, but it is block device specific, so limited to the availability of mounts, which not all cloud nodes have the ability to do.

    I would love to see glusterfs stable on public IP and better documented.

    Good luck with the module, I hope to be able to have a play with it this week to see if I was just being a numpty.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s