Automatic grouping in mgmt

In this post, I’ll tell you about the recently released “automatic grouping” or “AutoGroup” feature in mgmt, a next generation configuration management prototype. If you aren’t already familiar with mgmt, I’d recommend you start by reading the introductory post, and the second post. There’s also an introductory video.

Resources in a graph

Most configuration management systems use something called a directed acyclic graph, or DAG. This is a fancy way of saying that it is a bunch of circles (vertices) which are connected with arrows (edges). The arrows must be connected to exactly two vertices, and you’re only allowed to move along each arrow in one direction (directed). Lastly, if you start at any vertex in the graph, you must never be able to return to where you started by following the arrows (acyclic). If you can, the graph is not fit for our purpose.

A DAG from Wikipedia

An example DAG from Wikipedia

The graphs in configuration management systems usually represent the dependency relationships (edges) between the resources (vertices) which is important because you might want to declare that you want a certain package installed before you start a service. To represent the kind of work that you want to do, different kinds of resources exist which you can use to specify that work.

Each of the vertices in a graph represents a unique resource, and each is backed by an individual software routine or “program” which can check the state of the resource, and apply the correct state if needed. This makes each resource idempotent. If we have many individual programs, this might turn out to be a lot of work to do to get our graph into the desired state!

Resource grouping

It turns out that some resources have a fixed overhead to starting up and running. If we can group resources together so that they share this fixed overhead, then our graph might converge faster. This is exactly what we do in mgmt!

Take for example, a simple graph such as the following:

Simple DAG showing three pkg, two file, and one svc resource

Simple DAG showing one svc, two file, and three pkg resources…

We can logically group the three pkg resources together and redraw the graph so that it now looks like this:

DAG with the three pkg resources now grouped into one.

DAG with the three pkg resources now grouped into one! Overlapping vertices mean that they act as if they’re one vertex instead of three!

This all happens automatically of course! It is very important that the new graph is a faithful, logical representation of the original graph, so that the specified dependency relationships are preserved. What this represents, is that when multiple resources are grouped (shown by overlapping vertices in the graph) they run together as a single unit. This is the practical difference between running:

$ dnf install -y powertop
$ dnf install -y sl
$ dnf install -y cowsay

if not grouped, and:

$ dnf install -y powertop sl cowsay

when grouped. If you try this out you’ll see that the second scenario is much faster, and on my laptop about three times faster! This is because of fixed overhead such as cache updates, and the dnf dep solver that each process runs.

This grouping means mgmt uses this faster second scenario instead of the slower first scenario that all the current generation tools do. It’s also important to note that different resources can implement the grouping feature to optimize for different things besides performance. More on that later…

The algorithm

I’m not an algorithmist by training, so it took me some fiddling to come up with an appropriate solution. I’ve implemented it along with an extensive testing framework and a series of test cases, which it passes of course! If we ever find a graph that does not get grouped correctly, then we can iterate on the algorithm and add it as a new test case.

The algorithm turns out to be relatively simple. I first noticed that vertices which had a relationship between them must not get grouped, because that would undermine the precedence ordering of the vertices! This property is called reachability. I then attempt to group every vertex to every other vertex that has no reachability or reverse reachability to it!

The hard part turned out to be getting all the plumbing surrounding the algorithm correct, and in particular the actual vertex merging algorithm, so that “discarded edges” are reattached in the correct places. I also took a bit of extra time to implement the algorithm as a struct which satisfies an “AutoGrouper” interface. This way, if you’d like to implement a different algorithm, it’s easy to drop in your replacement. I’m fairly certain that a more optimal version of my algorithm is possible for anyone wishing to do the analysis.

A quick note on nomenclature: I’ve actually decided to call this grouping and not merging, because we actually preserve the unique data of each resource so that they can be taken apart and mixed differently when (and if) there is a change in the compiled graph. This makes graph changeovers very cheap in mgmt, because we don’t have to re-evaluate anything which remains constant between graphs. Merging would imply a permanent reduction and loss of unique identity.

Parallelism and user choice

It’s worth noting two important points:

  1. Auto grouping of resources usually decreases the parallelism of a graph.
  2. The user might not want a particular resource to get grouped!

You might remember that one of the novel properties of mgmt, is that it executes the graph in parallel whenever possible. Although the grouping of resources actually removes some of this parallelism, certain resources such as the pkg resource already have an innate constraint on sequential behaviour, namely: the package manager lock. Since these tools can’t operate in parallel, and since each execution has a fixed overhead, it’s almost always beneficial to group pkg resources together.

Grouping is also not mandatory, so while it is a sensible default, you can disable grouping per resource with a simple meta parameter.

Lastly, it’s also worth mentioning that grouping doesn’t “magically” happen without some effort. The underlying resource needs to know how to optimize, watch, check and apply multiple resources simultaneously for it to support the feature. At the moment, only the pkg resource can do any grouping, and even then, there could always be some room for improvement. It’s also not optimal (or even logical) to group certain types of resources, so those will never be able to do any grouping. We also don’t group together resources of different kinds, although mgmt could support this if a valid use case is ever found.

File grouping

As I mentioned, only the pkg resource supports grouping at this time. The file resource demonstrates a different use case for resource grouping. Suppose you want to monitor 10000 files in a particular directory, but they are specified individually. This would require far too many inotify watches than a normal system usually has, so the grouping algorithm could group them into a single resource, which then uses a recursive watcher such as fanotify to reduce the watcher count by a factor of 10000. Unfortunately neither the file resource grouping, nor the fanotify support for this exist at the moment. If you’d like to implement either of these, please let me know!

If you can think of another resource kind that you’d like to write, or in particular, if you know of one which would work well with resource grouping, please contact me!

Science!

I wouldn’t be a very good scientist (I’m actually a Physiologist by training) if I didn’t include some data and a demonstration that this all actually works, and improves performance! What follows will be a good deal of information, so skim through the parts you don’t care about.

Science <3 data

Science <3 data

I decided to test the following four scenarios:

  1. single package, package check, package already installed
  2. single package, package install, package not installed
  3. three packages, package check, packages already installed
  4. three packages, package install, packages not installed

These are the situations you’d encounter when running your tool of choice to install one or more packages, and finding them either already present, or in need of installation. I timed each test, which ends when the tool tells us that our system has converged.

Each test is performed multiple times, and the average is taken, but only after we’ve run the tool at least twice so that the caches are warm.

We chose small packages so that the fixed overhead delays due to bandwidth and latencies are minimal, and so that our data is more representative of the underlying tool.

The single package tests use the powertop package, and the three package tests use powertop, sl, and cowsay. All tests were performed on an up-to-date Fedora 23 laptop, with an SSD. If you haven’t tried sl and cowsay, do give them a go!

The four tools tested were:

  1. puppet
  2. mgmt
  3. pkcon
  4. dnf

The last two are package manager front ends so that it’s more obvious how expensive something is expected to cost, and so that you can discern what amount of overhead is expected, and what puppet or mgmt is causing you. Here are a few representative runs:

mgmt installation of powertop:

$ time sudo ./mgmt run --file examples/pkg1.yaml --converged-timeout=0
21:04:18 main.go:63: This is: mgmt, version: 0.0.3-1-g6f3ac4b
21:04:18 main.go:64: Main: Start: 1459299858287120473
21:04:18 main.go:190: Main: Running...
21:04:18 main.go:113: Etcd: Starting...
21:04:18 main.go:117: Main: Waiting...
21:04:18 etcd.go:113: Etcd: Watching...
21:04:18 etcd.go:113: Etcd: Watching...
21:04:18 configwatch.go:54: Watching: examples/pkg1.yaml
21:04:20 config.go:272: Compile: Adding AutoEdges...
21:04:20 config.go:533: Compile: Grouping: Algorithm: nonReachabilityGrouper...
21:04:20 main.go:171: Graph: Vertices(1), Edges(0)
21:04:20 main.go:174: Graphviz: No filename given!
21:04:20 pgraph.go:764: State: graphStateNil -> graphStateStarting
21:04:20 pgraph.go:825: State: graphStateStarting -> graphStateStarted
21:04:20 main.go:117: Main: Waiting...
21:04:20 pkg.go:245: Pkg[powertop]: CheckApply(true)
21:04:20 pkg.go:303: Pkg[powertop]: Apply
21:04:20 pkg.go:317: Pkg[powertop]: Set: installed...
21:04:25 packagekit.go:399: PackageKit: Woops: Signal.Path: /8442_beabdaea
21:04:25 packagekit.go:399: PackageKit: Woops: Signal.Path: /8443_acbadcbd
21:04:31 pkg.go:335: Pkg[powertop]: Set: installed success!
21:04:31 main.go:79: Converged for 0 seconds, exiting!
21:04:31 main.go:55: Interrupted by exit signal
21:04:31 pgraph.go:796: Pkg[powertop]: Exited
21:04:31 main.go:203: Goodbye!

real    0m13.320s
user    0m0.023s
sys    0m0.021s

puppet installation of powertop:

$ time sudo puppet apply pkg.pp 
Notice: Compiled catalog for computer.example.com in environment production in 0.69 seconds
Notice: /Stage[main]/Main/Package[powertop]/ensure: created
Notice: Applied catalog in 10.13 seconds

real    0m18.254s
user    0m9.211s
sys    0m2.074s

dnf installation of powertop:

$ time sudo dnf install -y powertop
Last metadata expiration check: 1:22:03 ago on Tue Mar 29 20:04:29 2016.
Dependencies resolved.
==========================================================================
 Package          Arch           Version            Repository       Size
==========================================================================
Installing:
 powertop         x86_64         2.8-1.fc23         updates         228 k

Transaction Summary
==========================================================================
Install  1 Package

Total download size: 228 k
Installed size: 576 k
Downloading Packages:
powertop-2.8-1.fc23.x86_64.rpm            212 kB/s | 228 kB     00:01    
--------------------------------------------------------------------------
Total                                     125 kB/s | 228 kB     00:01     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Installing  : powertop-2.8-1.fc23.x86_64                            1/1 
  Verifying   : powertop-2.8-1.fc23.x86_64                            1/1 

Installed:
  powertop.x86_64 2.8-1.fc23                                              

Complete!

real    0m10.406s
user    0m4.954s
sys    0m0.836s

puppet installation of powertop, sl and cowsay:

$ time sudo puppet apply pkg3.pp 
Notice: Compiled catalog for computer.example.com in environment production in 0.68 seconds
Notice: /Stage[main]/Main/Package[powertop]/ensure: created
Notice: /Stage[main]/Main/Package[sl]/ensure: created
Notice: /Stage[main]/Main/Package[cowsay]/ensure: created
Notice: Applied catalog in 33.02 seconds

real    0m41.229s
user    0m19.085s
sys    0m4.046s

pkcon installation of powertop, sl and cowsay:

$ time sudo pkcon install powertop sl cowsay
Resolving                     [=========================]         
Starting                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         

real    0m14.755s
user    0m0.060s
sys    0m0.025s

and finally, mgmt installation of powertop, sl and cowsay with autogrouping:

$ time sudo ./mgmt run --file examples/autogroup2.yaml --converged-timeout=0
21:16:00 main.go:63: This is: mgmt, version: 0.0.3-1-g6f3ac4b
21:16:00 main.go:64: Main: Start: 1459300560994114252
21:16:00 main.go:190: Main: Running...
21:16:00 main.go:113: Etcd: Starting...
21:16:00 main.go:117: Main: Waiting...
21:16:00 etcd.go:113: Etcd: Watching...
21:16:00 etcd.go:113: Etcd: Watching...
21:16:00 configwatch.go:54: Watching: examples/autogroup2.yaml
21:16:03 config.go:272: Compile: Adding AutoEdges...
21:16:03 config.go:533: Compile: Grouping: Algorithm: nonReachabilityGrouper...
21:16:03 config.go:533: Compile: Grouping: Success for: Pkg[powertop] into Pkg[cowsay]
21:16:03 config.go:533: Compile: Grouping: Success for: Pkg[sl] into Pkg[cowsay]
21:16:03 main.go:171: Graph: Vertices(1), Edges(0)
21:16:03 main.go:174: Graphviz: No filename given!
21:16:03 pgraph.go:764: State: graphStateNil -> graphStateStarting
21:16:03 pgraph.go:825: State: graphStateStarting -> graphStateStarted
21:16:03 main.go:117: Main: Waiting...
21:16:03 pkg.go:245: Pkg[autogroup:(cowsay,powertop,sl)]: CheckApply(true)
21:16:03 pkg.go:303: Pkg[autogroup:(cowsay,powertop,sl)]: Apply
21:16:03 pkg.go:317: Pkg[autogroup:(cowsay,powertop,sl)]: Set: installed...
21:16:08 packagekit.go:399: PackageKit: Woops: Signal.Path: /8547_cbeaddda
21:16:08 packagekit.go:399: PackageKit: Woops: Signal.Path: /8548_bcaadbce
21:16:16 pkg.go:335: Pkg[autogroup:(cowsay,powertop,sl)]: Set: installed success!
21:16:16 main.go:79: Converged for 0 seconds, exiting!
21:16:16 main.go:55: Interrupted by exit signal
21:16:16 pgraph.go:796: Pkg[cowsay]: Exited
21:16:16 main.go:203: Goodbye!

real    0m15.621s
user    0m0.040s
sys    0m0.038s

Results and analysis

My hard work seems to have paid off, because we do indeed see a noticeable improvement from grouping package resources. The data shows that even in the single package comparison cases, mgmt has very little overhead, which is demonstrated by seeing that the mgmt run times are very similar to the times it takes to run the package managers manually.

In the three package scenario, performance is approximately 2.39 times faster than puppet for installation. Checking was about 12 times faster! These ratios are expected to grow with a larger number of resources.

Sweet graph...

Bigger bars is worse… Puppet is in Red, mgmt is in blue.

The four groups at the bottom along the x axis correspond to the four scenarios I tested, 1, 2 and 3 corresponding to each run of that scenario, with the average of the three listed there too.

Versions

The test wouldn’t be complete if we didn’t tell you which specific version of each tool that we used. Let’s time those as well! ;)

puppet:

$ time puppet --version 
4.2.1

real    0m0.659s
user    0m0.525s
sys    0m0.064s

mgmt:

$ time ./mgmt --version
mgmt version 0.0.3-1-g6f3ac4b

real    0m0.007s
user    0m0.006s
sys    0m0.002s

pkcon:

$ time pkcon --version
1.0.11

real    0m0.013s
user    0m0.006s
sys    0m0.005s

dnf:

$ time dnf --version
1.1.7
  Installed: dnf-0:1.1.7-2.fc23.noarch at 2016-03-17 13:37
  Built    : Fedora Project at 2016-03-09 16:45

  Installed: rpm-0:4.13.0-0.rc1.12.fc23.x86_64 at 2016-03-03 09:39
  Built    : Fedora Project at 2016-02-29 09:53

real    0m0.438s
user    0m0.379s
sys    0m0.036s

Yep, puppet even takes the longest to tell us what version it is. Now I’m just teasing…

Methodology

It might have been more useful to time the removal of packages instead so that we further reduce the variability of internet bandwidth and latency, although since most configuration management is used to install packages (rather than remove), we figured this would be more appropriate and easy to understand. You’re welcome to conduct your own study and share the results!

Additionally, for fun, I also looked at puppet runs where three individual resources were used instead of a single resource with the title being an array of all three packages, and found no significant difference in the results. Indeed puppet runs dnf three separate times in either scenario:

$ ps auxww | grep dnf
root     12118 27.0  1.4 417060 110864 ?       Ds   21:57   0:03 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install powertop
$ ps auxww | grep dnf
root     12713 32.7  2.0 475204 159840 ?       Rs   21:57   0:02 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install sl
$ ps auxww | grep dnf
root     13126  0.0  0.7 275324 55608 ?        Rs   21:57   0:00 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install cowsay

Data

If you’d like to download the raw data as a text formatted table, and the terminal output from each type of run, I’ve posted it here.

Conclusion

I hope that you enjoyed this feature and analysis, and that you’ll help contribute to making it better. Come join our IRC channel and say hello! Thanks to those who reviewed my article and pointed out some good places for improvements!

Happy Hacking,

James

 

Advertisements

Testing GlusterFS during “Glusterfest”

The GlusterFS community is having a “test day”. Puppet-Gluster+Vagrant is a great tool to help with this, and it has now been patched to support alpha, beta, qa, and rc releases! Because it was built so well (*cough*, shameless plug), it only took one patch.

Okay, first make sure that your Puppet-Gluster+Vagrant setup is working properly. I have only tested this on Fedora 20. Please read:

Automatically deploying GlusterFS with Puppet-Gluster+Vagrant!

to make sure you’re comfortable with the tools and infrastructure.

This weekend we’re testing 3.5.0 beta1. It turns out that the full rpm version for this is:

3.5.0-0.1.beta1.el6

You can figure out these strings yourself by browsing the folders in:

https://download.gluster.org/pub/gluster/glusterfs/qa-releases/

To test a specific version, use the --gluster-version argument that I added to the vagrant command. For this deployment, here is the list of commands that I used:

$ mkdir /tmp/vagrant/
$ cd /tmp/vagrant/
$ git clone --recursive https://github.com/purpleidea/puppet-gluster.git
$ cd vagrant/gluster/
$ vagrant up puppet
$ sudo -v && vagrant up --gluster-version='3.5.0-0.1.beta1.el6' --gluster-count=2 --no-parallel

As you can see, this is a standard vagrant deploy. I’ve decided to build two gluster hosts (--gluster-count=2) and I’m specifying the version string shown above. I’ve also decided to build in series (--no-parallel) because I think there might be some hidden race conditions, possibly in the vagrant-libvirt stack.

After about five minutes, the two hosts were built, and about six minutes after that, Puppet-Gluster had finished doing its magic. I had logged in to watch the progress, but if you were out getting a coffee, when you came back you could run:

$ gluster volume info

to see your newly created volume!

If you want to try a different version or host count, you don’t need to destroy the entire infrastructure. You can destroy the gluster annex hosts:

$ vagrant destroy annex{1..2}

and then run a new vagrant up command.

In addition, I’ve added a --gluster-firewall option. Currently it defaults to false because there’s a strange firewall bug blocking my VRRP (keepalived) setup. If you’d like to enable it and help me fix this bug, you can use:

--gluster-firewall=true

To make sure the firewall is off, you can use:

--gluster-firewall=false

In the future, I will change the default value to true, so specify it explicitly if you need a certain behaviour.

Happy hacking,

James

Automatically deploying GlusterFS with Puppet-Gluster + Vagrant!

Puppet-Gluster was always about automating the deployment of GlusterFS. Getting your own Puppet server and the associated infrastructure running was never included “out of the box“. Today, it is! (This is big news!)

I’ve used Vagrant to automatically build these GlusterFS clusters. I’ve tested this with Fedora 20, and vagrant-libvirt. This won’t work with Fedora 19 because of bz#876541. I recommend first reading my earlier articles for Vagrant and Fedora:

Once you’re comfortable with the material in the above articles, we can continue…

The short answer:

$ sudo service nfs start
$ git clone --recursive https://github.com/purpleidea/puppet-gluster.git
$ cd puppet-gluster/vagrant/gluster/
$ vagrant up puppet && sudo -v && vagrant up

Once those commands finish, you should have four running gluster hosts, and a puppet server. The gluster hosts will still be building. You can log in and tail -F log files, or watch -n 1 gluster status commands.

The whole process including the one-time downloads took about 30 minutes. If you’ve got faster internet that I do, I’m sure you can cut that down to under 20. Building the gluster hosts themselves probably takes about 15 minutes.

Enjoy your new Gluster cluster!

Screenshots!

I took a few screenshots to make this more visual for you. I like to have virt-manager open so that I can visually see what’s going on:

The annex{1..4} machines are building in parallel.

The annex{1..4} machines are building in parallel. The valleys happened when the machines were waiting for the vagrant DHCP server (dnsmasq).

Here we can see two puppet runs happening on annex1 and annex4.

Notice the two peaks on the puppet server which correspond to the valleys on annex{1,4}.

Notice the two peaks on the puppet server which correspond to the valleys on annex{1,4}.

Here’s another example, with four hosts working in parallel:

foo

Can you answer why the annex machines have two peaks? Why are the second peaks bigger?

Tell me more!

Okay, let’s start with the command sequence shown above.

$ sudo service nfs start

This needs to be run once if the NFS server on your host is not already running. It is used to provide folder synchronization for the Vagrant guest machines. I offer more information about the NFS synchronization in an earlier article.

$ git clone --recursive https://github.com/purpleidea/puppet-gluster.git

This will pull down the Puppet-Gluster source, and all the necessary submodules.

$ cd puppet-gluster/vagrant/gluster/

The Puppet-Gluster source contains a vagrant subdirectory. I’ve included a gluster subdirectory inside it, so that your machines get a sensible prefix. In the future, this might not be necessary.

$ vagrant up puppet && sudo -v && vagrant up

This is where the fun stuff happens. You’ll need a base box image to run machines with Vagrant. Luckily, I’ve already built one for you, and it is generously hosted by the Gluster community.

The very first time you run this Vagrant command, it will download this image automatically, install it and then continue the build process. This initial box download and installation only happens once. Subsequent Puppet-Gluster+Vagrant deploys and re-deploys won’t need to re-download the base image!

This command starts by building the puppet server. Vagrant might use sudo to prompt you for root access. This is used to manage your /etc/exports file. After the puppet server is finished building, we refresh the sudo cache to avoid bug #2680.

The last vagrant up command starts up the remaining gluster hosts in parallel, and kicks off the initial puppet runs. As I mentioned above, the gluster hosts will still be building. Puppet automatically waits for the cluster to “settle” and enter a steady state (no host/brick changes) before it creates the first volume. You can log in and tail -F log files, or watch -n 1 gluster status commands.

At this point, your cluster is running and you can do whatever you want with it! Puppet-Gluster+Vagrant is meant to be easy. If this wasn’t easy, or you can think of a way to make this better, let me know!

I want N hosts, not 4:

By default, this will build four (4) gluster hosts. I’ve spent a lot of time writing a fancy Vagrantfile, to give you speed and configurability. If you’d like to set a different number of hosts, you’ll first need to destroy the hosts that you’ve built already:

$ vagrant destroy annex{1..4}

You don’t have to rebuild the puppet server because this command is clever and automatically cleans the old host entries from it! This makes re-deploying even faster!

Then, run the vagrant up command with the --gluster-count=<N> argument. Example:

$ vagrant up --gluster-count=8

This is also configurable in the puppet-gluster.yaml file which will appear in your vagrant working directory. Remember that before you change any configuration option, you should destroy the affected hosts first, otherwise vagrant can get confused about the current machine state.

I want to test a different GlusterFS version:

By default, this will use the packages from:

https://download.gluster.org/pub/gluster/glusterfs/LATEST/

but if you’d like to pick a specific GlusterFS version you can do so with the --gluster-version=<version> argument. Example:

$ vagrant up --gluster-version='3.4.2-1.el6'

This is also stored, and configurable in the puppet-gluster.yaml file.

Does (repeating) this consume a lot of bandwidth?

Not really, no. There is an initial download of about 450MB for the base image. You’ll only ever need to download this again if I publish an updated version.

Each deployment will hit a public mirror to download the necessary puppet, GlusterFS and keepalived packages. The puppet server caused about 115MB of package downloads, and each gluster host needed about 58MB.

The great thing about this setup, is that it is integrated with vagrant-cachier, so that you don’t have to re-download packages. When building the gluster hosts in parallel (the default), each host will have to download the necessary packages into a separate directory. If you’d like each host to share the same cache folder, and save yourself 58MB or so per machine, you’ll need to build in series. You can do this with:

$ vagrant up --no-parallel

I chose speed over preserving bandwidth, so I do not recommend this option. If your ISP has a bandwidth cap, you should find one that isn’t crippled.

Subsequent re-builds won’t download any packages that haven’t already been downloaded.

What should I look for?

Once the vagrant commands are done running, you’ll want to look at something to see how your machines are doing. I like to log in and run different commands. I usually log in like this:

$ vcssh --screen root@annex{1..4}

I explain how to do this kind of magic in an earlier post. I then run some of the following commands:

# tail -F /var/log/messages

This lets me see what the machine is doing. I can see puppet runs and other useful information fly by.

# ip a s eth2

This lets me check that the VIP is working correctly, and which machine it’s on. This should usually be the first machine.

# ps auxww | grep again.[py]

This lets me see if puppet has scheduled another puppet run. My scripts automatically do this when they decide that there is still building (deployment) left to do. If you see a python again.py process, this means it is sleeping and will wake up to continue shortly.

# gluster peer status

This lets me see which hosts have connected, and what state they’re in.

# gluster volume info

This lets me see if Puppet-Gluster has built me a volume yet. By default it builds one distributed volume named puppet.

I want to configure this differently!

Okay, you’re more than welcome to! All of the scripts can be customized. If you want to configure the volume(s) differently, you’ll want to look in the:

puppet-gluster/vagrant/gluster/puppet/manifests/site.pp

file. The gluster::simple class is well-documented, and can be configured however you like. If you want to do more serious hacking, have a look at the Vagrantfile, the source, and the submodules. Of course the GlusterFS source is a great place to hack too!

The network block, domain, and other parameters are all configurable inside of the Vagrantfile. I’ve tried to use sensible defaults where possible. I’m using example.com as the default domain. Yes, this will work fine on your private network. DNS is currently configured with the /etc/hosts file. I wrote some magic into the Vagrantfile so that the slow /etc/hosts shell provisioning only has to happen once per machine! If you have a better, functioning, alternative, please let me know!

What’s the greatest number of machines this will scale to?

Good question! I’d like to know too! I know that GlusterFS probably can’t scale to 1000 hosts yet. Keepalived can’t support more than 256 priorities, therefore Puppet-Gluster can’t scale beyond that count until a suitable fix can be found. There are likely some earlier limits inside of Puppet-Gluster due to maximum command line length constraints that you’ll hit. If you find any, let me know and I’ll patch them. Patches now cost around seven karma points. Other than those limits, there’s the limit of my hardware. Since this is all being virtualized on a lowly X201, my tests are limited. An upgrade would be awesome!

Can I use this to test QA releases, point releases and new GlusterFS versions?

Absolutely! That’s the idea. There are two caveats:

  1. Automatically testing QA releases isn’t supported until the QA packages have a sensible home on download.gluster.org or similar. This will need a change to:
    https://github.com/purpleidea/puppet-gluster/blob/master/manifests/repo.pp#L18

    The gluster community is working on this, and as soon as a solution is found, I’ll patch Puppet-Gluster to support it. If you want to disable automatic repository management (in gluster::simple) and manage this yourself with the vagrant shell provisioner, you’re able to do so now.

  2. It’s possible that new releases introduce bugs, or change things in a backwards incompatible way that breaks Puppet-Gluster. If this happens, please let me know so that something can get patched. That’s what testing is for!

You could probably use this infrastructure to test GlusterFS builds automatically, but that’s a project that would need real funding.

Updating the puppet source:

If you make a change to the puppet source, but you don’t want to rebuild the puppet virtual machine, you don’t have to. All you have to do is run:

$ vagrant provision puppet

This will update the puppet server with any changes made to the source tree at:

puppet-gluster/vagrant/gluster/puppet/

Keep in mind that the modules subdirectory contains all the necessary puppet submodules, and a clone of puppet-gluster itself. You’ll first need to run make inside of:

puppet-gluster/vagrant/gluster/puppet/modules/

to refresh the local clone. To see what’s going on, or to customize the process, you can look at the Makefile.

Client machines:

At the moment, this doesn’t build separate machines for gluster client use. You can either mount your gluster pool from the puppet server, another gluster server, or if you add the right DNS entries to your /etc/hosts, you can mount the volume on your host machine. If you really want Vagrant to build client machines, you’ll have to persuade me.

What about firewalls?

Normally I use shorewall to manage the firewall. It integrates well with Puppet-Gluster, and does a great job. For an unexplained reason, it seems to be blocking my VRRP (keepalived) traffic, and I had to disable it. I think this is due to a libvirt networking bug, but I can’t prove it yet. If you can help debug this issue, please let me know! To reproduce it, enable the firewall and shorewall directives in:

puppet-gluster/vagrant/gluster/puppet/manifests/site.pp

and then get keepalived to work.

Re-provisioning a machine after a long wait throws an error:

You might be hitting: vagrant-cachier #74. If you do, there is an available workaround.

Keepalived shows “invalid passwd!” messages in the logs:

This is expected. This happens because we build a distributed password for use with keepalived. Before the cluster state has settled, the password will be different from host to host. Only when the cluster is coherent will the password be identical everywhere, which incidentally is the only time when the VIP matters.

How did you build that awesome base image?

I used virt-builder, some scripts, and a clever Makefile. I’ll be publishing this code shortly. Hasn’t this been enough to keep you busy for a while?

Are you still here?

If you’ve read this far, then good for you! I’m sorry that it has been a long read, but I figured I would try to answer everyone’s questions in advance. I’d like to hear your comments! I get very little feedback, and I’ve never gotten a single tip! If you find this useful, please let me know.

Until then,

Happy hacking,

James