Speaking at LISA 2013 about Puppet and GlusterFS

I’m speaking at LISA 2013, the “Large Installation System Administration” conference. This conference runs all week in Washington. I’ll be giving two talks during the week, and attending at least one BOF.

My first talk is on Monday during the Gluster Community Day. I’ll be speaking about puppet-gluster, and giving a live demo. I’ll be showing some new features too. If you’d like to talk more about puppet-gluster, or want to attend the talk, give me a shout, or sign up at the above Gluster Community Day link.

On Tuesday, I’ll be giving a talk during Puppet Camp DC. My talk will be about “Pushing Puppet (to the limit)”. I’ve prepared lots of fun puppet hacks, and live demos, so I expect you’ll thoroughly enjoy this.

I’ve got lots of technical hacks, and great code that I’ve published lately, but that I haven’t blogged about yet. If you follow along with my git commits, you’ll be able to figure out most of it, but I’ve got a bunch of articles coming anyways.

Thanks to John Mark and RedHat for sponsoring my trip, Dawn Foster and Kara Sowles for organizing my Puppet talk and Hilary Hartman at USENIX for helping me register.

Happy Hacking,

James

PS: If you’re at LISA 2013, and you want to buy me dinner and get some light Puppet or Gluster consulting, contact me! You can also give me a shout if you just want to talk tech. If anyone wants to meet up and hack on things, please let me know too.

I'm speaking at LISA 2013

Easier strace of scripts with pidof -x

Here’s a one minute read, about a trick which I discovered today:

When running an strace, it’s common to do something like:

strace -p<pid>

Smarter hackers know that they can use some bash magic and do:

strace -p`pidof <process name>`

However, if you’re tracing a script named foo.py, this won’t work because the real process is the script’s interpreter, and pidof python, might return other unrelated python scripts.

strace -p`pidof foo.py` # won't work
[failure]
[user sifting through ps output for real pid]
[computer explodes]

The trick is to use the -x flag of pidof. This will let you pass in your script’s name, and pidof will take care of the rest:

strace -p`pidof -x foo.py` # works!
[user cheering]
[normal strace noise]

Awesome!

Happy hacking,

James

 

first release of puppet-shorewall

Oh, hi there.

In case you’re interested, I’ve just made a first release of my puppet-shorewall module. This isn’t meant as an exhaustive shorewall module, but it does provide most of the usual functionality that most users need.

In particular, it’s the module dependency that I use for many of my other puppet modules that provide firewalling. This is probably where you’re most likely to consume it.

In general most modules just implement shorewall::rule, so if you really don’t want to use this code, you can implement that signature yourself, or not use automatic firewalling. The shorewall::rule type has two main signatures, so have a look at the source, or a simple example if you want to get more familiar with the specifics. Using this module is highly recommended, specifically with puppet-gluster.

Please keep in mind that since I mostly use this module to open ports and to keep my other modules happy, I probably don’t have advanced traffic control features on my roadmap. If you’re looking for something that I haven’t added, contact me with the details and consider sponsoring some features.

Happy hacking,

James

Show current git branch in PS1 when branch is not master

Short post, long command…

I’ve decided to start showing the current git branch in my PS1. However, since I don’t want to know when I’m on master, I had to write a new PS1 that I haven’t yet seen anywhere. Add the following to your .bashrc:

PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
if [ -e /usr/share/git-core/contrib/completion/git-prompt.sh ]; then
    . /usr/share/git-core/contrib/completion/git-prompt.sh
    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w$([ "$(__git_ps1 %s)" != "" -a "$(__git_ps1 %s)" != "master" ] && (echo -e " (\[33[32m\]"$(__git_ps1 "%s")"\[33[0m\])") || echo "")\$ '
fi

This keeps my PS1 short for when I’m hacking on personal repositories that only have a single branch. Keep in mind that you might have to change the path to git-prompt.sh depending on what OS you’re using.

Example:

james@computer:~/code/puppet$ cd puppet-gluster
james@computer:~/code/puppet/puppet-gluster$ git checkout -b cool-new-feature
Switched to a new branch 'cool-new-feature'
james@computer:~/code/puppet/puppet-gluster (cool-new-feature)$ # tada !

The branch name is coloured to match the default colours that git uses to colour branches.

Happy hacking,

James

 

Finite state machines in puppet

In my attempt to push puppet to its limits, (for no particular reason), to develop more powerful puppet modules, to build in a distributed lock manager, and to be more dynamic, I’m now attempting to build a Finite State Machine (FSM) in puppet.

Is this a real finite state machine, and why would you do this?

Computer science professionals might not approve of the purity level, but they will hopefully appreciate the hack value. I’ve done this to illustrate a state transition technique that will be necessary in a module that I am writing.

Can we have an example?

Sure! I’ve decided to model thermodynamic phase transitions. Here’s what we’re building:

Phase_change_-_en.svg

How does it work?

Start off with a given define that accepts an argument. It could have one argument, or many, and be of whichever type you like, such as an integer, or even a more complicated list type. To keep the example simple, let’s work with a single argument named $input.

define fsm::transition(
        $input = ''
) {
        # TODO: add amazing code here...
}

The FSM runs as follows: On first execution, the $input value is saved to a local file by means of a puppet exec type. A corresponding fact exists to read from that file and create a unique variable for the fsm::transition type. Let’s call that variable $last. This is the special part!

# ruby fact to pull in the data from the state file
found = {}
Dir.glob(transition_dir+'*').each do |d|
    n = File.basename(d)    # should be the fsm::transition name
    if n.length > 0 and regexp.match(n)
        f = d.gsub(/\/$/, '')+'/state'    # full file path
        if File.exists?(f)
            # TODO: future versions should unpickle (but with yaml)
            v = File.open(f, 'r').read.strip    # read into str
            if v.length > 0 and regexp.match(v)
                found[n] = v
            end
        end
    end
end

found.keys.each do |x|
    Facter.add('fsm_transition_'+x) do
        #confine :operatingsystem => %w{CentOS, RedHat, Fedora}
        setcode {
            found[x]
        }
    end
end

On subsequent runs, the process gets more interesting: The $input value and the $last value are used to decide what to run. They can be different because the user might have changed the $input value. Logic trees then decide what actions you’d like to perform. This lets us compare the previous state to the new desired state, and as a result, be more intelligent about which actions need to run for a successful state transition. This is the FSM part.

# logic tree modeling phase transitions
# https://en.wikipedia.org/wiki/Phase_transition
$transition = "${valid_last}" ? {
        'solid' => "${valid_input}" ? {
               'solid' => true,
               'liquid' => 'melting',
               'gas' => 'sublimation',
               'plasma' => false,
               default => '',
        },
        'liquid' => "${valid_input}" ? {
               'solid' => 'freezing',
               'liquid' => true,
               'gas' => 'vaporization',
               'plasma' => false,
               default => '',
        },
        'gas' => "${valid_input}" ? {
               'solid' => 'deposition',
               'liquid' => 'condensation',
               'gas' => true,
               'plasma' => 'ionization',
               default => '',
        },
        'plasma' => "${valid_input}" ? {
               'solid' => false,
               'liquid' => false,
               'gas' => 'recombination',
               'plasma' => true,
               default => '',
        },
        default => '',
}

Once the state transition actions have completed successfully, the exec must store the $input value in the local file for future use as the unique $last fact for the next puppet run. If there are errors during state transition execution, you may choose to not store the updated value (to cause a re-run) and/or to add an error condition fact that the subsequent puppet run will have to read in and handle accordingly. This is the important part.

$f = "${vardir}/transition/${name}/state"
$diff = "/usr/bin/test '${valid_input}' != '${valid_last}'"

# TODO: future versions should pickle (but with yaml)
exec { "/bin/echo '${valid_input}' > '${f}'":
        logoutput => on_failure,
        onlyif => "/usr/bin/test ! -e '${f}' || ${diff}",
        require => File["${vardir}/"],
        alias => "fsm-transition-${name}",
}

Can we take this further?

It might be beneficial to remember the path we took through our graph. To do this, on each transition we append the new state to a file on our local puppet client. The corresponding fact, is similar to the $last fact, except it maintains a list of values instead of just one. There is a max length variable that can be used to avoid storing unlimited old states.

Does this have a practical use?

Yes, absolutely! I realized that something like this could be useful for puppet-gluster. Stay tuned for more patches.

Hopefully you enjoyed this. By following the above guidelines, you should now have some extra tricks for building state transitions into your puppet modules. Let me know if you found this hack awesome and unique.

I’ve posted the full example module here.

Happy Hacking,

James

 

Bittorent sync for repository mirroring

Theron Conrey writes about using:

BitTorrent Sync as Geo-Replication for Storage

We got a chance to talk about this idea at Linuxcon. I’m not entirely convinced there aren’t some problem edge cases with this solution, but I think it will be hard to tell as long as the BitTorrent sync library is proprietary. I did come up with a special case of Theron’s idea that I believe could work well.

The special case uses the optimization that the synchronization (or file transferring) is unidirectional. This avoids any coherency complications involved if both sides were to write to the same file. Combined with the BitTorrent protocol, this does what normal torrent usage does, except with BitTorrent sync, we’re looking at a folder full of files.

What kind of synchronization would benefit from this model? Repository mirroring! This is exactly a folder full of files, but going in only one direction. Instead of yum or deb mirrors each running rsync, they could use BitTorrent sync, and because of the large amount of available upload bandwidth usually available on these mirrors, “seeding”, wouldn’t be a problem, and the worldwide pool would synchronize faster.

Can we apply this to user mirroring, net installers, and machine updating? Absolutely. I believe someone has already looked into the updates scenario, but it didn’t progress for some reason. The more convincing case is still the server geo-replication of course.

Obviously, using glusterfs with puppet-gluster to host the mirrors could be a good fit. You might not even need to use any gluster replication when you have built-in geo-replication via other mirrors.

If someone works up the open source BitTorrent parts, I’m happy to hack together the puppet parts to turn this into a turn-key solution for mirror hosts.

Hope you liked this idea.

Happy hacking,

James

Gluster Community Day, Thursday

I’m here in New Orleans hacking up a storm and getting to meet fellow gluster users IRL. John Mark Walker started off with a great “State of the GlusterFS union” style talk.

Today Louis (semiosis) gave a great talk about running glusterfs on amazon. It was highly pragmatic and he explained how he chose the number of bricks per host. The talk will be posted online shortly.

Marco Ceppi from Canonical gave a talk about juju and gluster. I haven’t had much time to look at juju, so it was good exposure. Marco’s gluster charm suffers from a lack of high availability peering, but I’m sure that is easily solved, and it isn’t a big issue. I had the same issue when working on puppet-gluster. I’ve written an article about how I solved this problem. I think it’s the most elegant solution, but if anyone has a better idea, please let me know. The solutions I used for puppet, can be applied to juju too. Marco and I talked about porting puppet-gluster to ubuntu. We also talked about using puppet inside of juju, with a puppetmaster, but we’re not sure how useful that would be beyond pure hack value.

Joe Julian gave a talk on running a MySQL (MariaDB) on glusterfs and getting mostly decent performance. That man knows his gluster internals.

I presented my talk about puppet-gluster. I had a successful live demo, which ran over ssh+screen across the conference centre internet to my home cluster Montreal. With interspersed talking, the full deploy took about eight minutes. Hope you enjoyed it. Let me know if you have any trouble with your setup and what features you’re missing. The video will be posted shortly.

Thanks again to John Mark Walker, RedHat and gluster.org for sponsoring my trip.

Happy hacking,

James