Hiera data in modules and OS independent puppet

Earlier this year, R.I.Pienaar released his brilliant data in modules hack, a few months ago, I got the chance to start implementing it in Puppet-Gluster, and today I have found the time to blog about it.

What is it?

R.I.’s hack lets you store hiera data inside a puppet module. This can have many uses including letting you throw out the nested mess that is commonly params.pp, and replace it with something file based that is elegant and hierarchical. For my use case, I’m using it to build OS independent puppet modules, without storing this data as code. The secondary win is that porting your module to a new GNU/Linux distribution or version could be as simple as adding a YAML file.

How does it work?

(For the specifics on the hack in general, please read R.I. Pienaar’s blog post. After you’re comfortable with that, please continue…)

In the hiera.yaml data/ hierarchy, I define an OS / version structure that should probably cover all use cases. It looks like this:

- params/%{::osfamily}/%{::operatingsystem}/%{::operatingsystemrelease}
- params/%{::osfamily}/%{::operatingsystem}
- params/%{::osfamily}
- common

At the bottom, you can specify common data, which can be overridden by OS family specific data (think RedHat “like” vs. Debian “like”), which can be overridden with operating system specific data (think CentOS vs. Fedora), which can finally be overridden with operating system version specific data (think RHEL6 vs. RHEL7).

Grouping the commonalities near the bottom of the tree, avoids duplication, and makes it possible to support new OS versions with fewer changes. It would be especially cool if someone could write a script to refactor commonalities downwards, and to refactor new uniqueness upwards.

This is an except of the Fedora specific YAML file:

gluster::params::package_glusterfs_server: 'glusterfs-server'
gluster::params::program_mkfs_xfs: '/usr/sbin/mkfs.xfs'
gluster::params::program_mkfs_ext4: '/usr/sbin/mkfs.ext4'
gluster::params::program_findmnt: '/usr/bin/findmnt'
gluster::params::service_glusterd: 'glusterd'
gluster::params::misc_gluster_reload: '/usr/bin/systemctl reload glusterd'

Since we use full paths in Puppet-Gluster, and since they are uniquely different in Fedora (no more: /bin) it’s nice to specify them all here. The added advantage is that you can easily drop in different versions of these utilities if you want to test a patched release without having to edit your system utilities. In addition, you’ll see that the OS specific RPM package name and service names are in here too. On a Debian system, they are usually different.


This depends on Puppet >= 3.x and having the puppet-module-data module included. I do so for integration with vagrant like so.

Should I still use params.pp?

I think that this answer is yes. I use a params.pp file with a single class specifying all the defaults:

class gluster::params(
    # packages...
    $package_glusterfs_server = 'glusterfs-server',

    $program_mkfs_xfs = '/sbin/mkfs.xfs',
    $program_mkfs_ext4 = '/sbin/mkfs.ext4',

    # services...
    $service_glusterd = 'glusterd',

    # misc...
    $misc_gluster_reload = '/sbin/service glusterd reload',

    # comment...
    $comment = ''
) {
    if "${comment}" == '' {
        warning('Unable to load yaml data/ directory!')

    # ...


In my data/common.yaml I include a bogus comment canary so that I can trigger a warning if the data in modules module isn’t working. This shouldn’t be a fail as long as you want to allow backwards compatibility, otherwise it should be! The defaults I use correspond to the primary OS I hack and use this module with, which in this case is CentOS 6.x.

To use this data in your module, include the params.pp file, and start using it. Example:

include gluster::params
package { "${::gluster::params::package_glusterfs_server}":
    ensure => present,

Unfortunately the readability isn’t nearly as nice as it is without this, however it’s an essential evil, due to the puppet language limitations.

Common patterns:

There are a few common code patterns, which you might need for this technique. The first few, I’ve already mentioned above. These are the tree layout in hiera.yaml, the comment canary, and the params.pp defaults. There’s one more that you might find helpful…

The split package pattern:

Certain packages are split into multiple pieces on some operating systems, and grouped together on others. This means there isn’t always a one-to-one mapping between the data and the package type. For simple cases you can use a hiera array:

# this hiera value could be an array of strings...
package { $::some_module::params::package::some_package_list:
    ensure => present,
    alias => 'some_package',
service { 'foo':
    require => Package['some_package'],

For this to work you must always define at least one element in the array. For more complex cases you might need to test for the secondary package in the split:

if "${::some_module::params::package::some_package}" != '' {
    package { "${::some_module::params::package::some_package}":
        ensure => present,
        alias => 'some_package', # or use the $name and skip this

service { 'foo':
    require => "${::some_module::params::package::some_package}" ? {
        '' => undef,
        default => Package['some_package'],

This pattern is used in Puppet-Gluster in more than one place. It turns out that it’s also useful when optional python packages get pulled into the system python. (example)

Hopefully you found this useful. Please help increase the multi-os aspect of Puppet-Gluster by submitting patches to the YAML files, and by testing it on your favourite GNU/Linux distro!

Happy hacking!


EDIT: I’ve updated the article to use the new recommended directory naming convention of ‘params’ instead of ‘tree’. Example.

Iteration in Puppet

People often ask how to do iteration in Puppet. Most Puppet users have a background in imperative programming, and are already very familiar with for loops. Puppet is sometimes confusing at first, because it is actually (or technically, contains) a declarative, domain-specific language. In general, DSL’s aren’t always Turing complete, nor do they need to support loops, but this doesn’t mean you can’t iterate.

Until recently, Puppet didn’t have an explicit looping construct, and it is quite possible to build complex modules without using this new construct. There are even some who believe that the language shouldn’t even contain this feature. I’ll abstain from that debate for now, but instead, I would like to show you some iteration techniques that you can use to get your desired result.


puppets-all-the-way-downMany people forget that recursion is a form of iteration. Even more don’t realize that you can do recursion in Puppet:

#!/usr/bin/puppet apply

define recursion(
) {
    # do something here...
    notify { "count-${count}":
    $minus1 = inline_template('<%= count.to_i - 1 %>')
    if "${minus1}" == '0' {
        notify { 'done counting!':
    } else {
        # recurse
        recursion { "count-${minus1}":
            count => $minus1,

# kick it off...
recursion { 'start':
    count => 4,

If you really want to Push Puppet, even more advanced recursion is possible. In general, I haven’t found this technique very useful for module design, but it’s worth mentioning as a form of iteration. If you do find a legitimate use of this technique, please let me know!

Type iteration

We’re used to seeing simple type declarations such as:

user { 'james':
    ensure => present,
    comment => 'james is awesome!',

In fact, the namevar can actually accept a list:

$users = ['kermit', 'beaker', 'statler', 'waldorf', 'tom']
user { $users:
    ensure => present,
    comment => 'who gave these muppets user accounts?',

Which will cause Puppet to effectively iterate across the elements in $users. This is the most important type of iteration in Puppet. Please get familiar with it.

This technique can be used with any type. It can even be used to express a many-to-one dependency relationship:

# where $bricks is a list of gluster::brick names
Gluster::Brick[$bricks] -> Gluster::Volume[$name]    # volume requires bricks

Suppose you’d like to use type iteration, but you’d also like to know the index of each element. This can be useful to avoid duplicate sub-types, or to provide a unique index:

define some_module::process_array(
    $array    # pass in the original $name
) {
    #notice(inline_template('NAME: <%= name.inspect %>'))

    # do something here...

    # build a unique name...
    $length = inline_template('<%= array.length %>')
    $ulength = inline_template('<%= array.uniq.length %>')
    if ( "${length}" != '0' ) and ( "${length}" != "${ulength}" ) {
        fail('Array must not have duplicates.')
    # if array had duplicates, this wouldn't be a unique index
    $index = inline_template('<%= array.index(name) %>')

    # iterate, knowing your index
    some::type { "${foo}:${index}":
        foo => 'hello',
        index => "${index}",

# a list
$some_array = ['a', 'b', 'c']    # must not have duplicates

# using the type requires that you pass in $some_array twice!
some_module::process_array { $some_array:    # array
    foo => 'bar',
    array => $some_array,    # same array as above

While this example might seem contrived, it is actually a modified excerpt from a module that I wrote.


This is a similar technique for when you want to specify different arguments for each type:

$defaults = {
    ensure => present,
    comment => 'a muppet',
$data = {
    'kermit' => {
        comment => 'the frog',
    'beaker' => {
        comment => 'keep him away from your glassware',
    'fozzie' => {
        home => '/home/fozzie',
    'tom' => {
        comment => 'the swedish chef',
create_resources('user', $data, $defaults)

This creates each user resource with its own arguments. If an argument isn’t given in the $data, it is taken from the $defaults hash. A similar example, and the official documentation is found here.

Template iteration

You might want to iterate to perform a simple computation, or to modify an array in some way. For static value computations, you can often use a template. Remember that the template will get executed at compile time on the Puppet Master, so code accordingly. Here are a few contrived examples:

# filter out all the integers less than zero
$array_in = [-4,3,-8,-2,1,4,-2,1,5,-1,-7,9,-3,2,6,-8,5,3,5,-6,8,9,7,-5,9,3,-3]
$array_out = split(inline_template('<%= array_in.delete_if {|x| x < 0 }.join(",") %>'), ',')

We can also use the ruby map:

# build out a greeting string
$names = ['animal', 'gonzo', 'rowlf']
# NOTE: you can also use a regular multi-line template for readability
$message = inline_template('<% if names == [] %>Hello... Anyone there?<% else %><%= names.map {|x| "Hello "+x.capitalize}.join(", ") %>.<% end %>')

Use your imagination! Remember that you can also write a custom function if necessary, but first check that there isn’t already a built-in function, or a stdlib function that suits your needs.

Advanced template iteration

When you really need to get fancy, it’s often time to call in a custom function. Custom functions require that you split them off into separate files, and away from the module logic, instead of keeping the functions inline and accessible as lambdas. The downside to using these “inline_template” lambdas instead, is that they can quickly turn into parlous one-liners.

# transform the $data hash
$data = {
    'waldorf' => {
        'heckles' => 'absolutely',
        'comment' => 'a critic',
    'statler' => {
        'heckles' => 'all the time',
        'comment' => 'another critic!',
# rename and filter on the 'heckles' key
$yaml = inline_template('<%= data.inject({}) {|h, (x,y)| h[x] = {"applauds" => y.fetch("heckles", "yes")}; h}.to_yaml %>')
$output = parseyaml($yaml) # parseyaml is in the puppetlabs-stdlib

As with simple template iteration, the key problem is transferring the data in and out of the template. In the simple case, arrays can be joined and split as long as there is a reserved character that won’t be used in the data. For the advanced template iteration, we rely on the YAML transformation functions.

Some reminders

If you properly understand the functionality that your module is trying to model/manage, you can usually break it up into separate classes and defined types, such that re-use via type iteration can fulfill your needs. Usually you’ll end up with a more properly designed module.

Test using the same version of Ruby that will run your module. Newer versions of Ruby have some incompatible changes, and new features, with respect to older versions of Ruby.

Remember that templates and functions run on the Puppet Master, but facts and types run on the client (agent).

The Puppet language is mostly declarative. Because this might be an unfamiliar paradigm, try not to look for all the imperative features that you’re used to. Having a programming background can help, because there’s certainly programming mixed in, whether you’re writing custom functions, or erb templates.

Future parser

For completeness, I should mention that the future parser now supports native iteration. If you need it, it probably means that you’re writing a fairly advanced module, and you’re comfortable manual diving. If you have a legitimate use case that isn’t possible with the existing constructs, and isn’t only a readability improvement, please let me know.


I hope you enjoyed this article. The next time someone asks you how to iterate in Puppet, feel free to link them this way.

Happy hacking,


Finding YAML errors in puppet

I love tabs, they’re so much easier to work with, but YAML doesn’t like them. I’m constantly adding them in accidentally, and puppet’s error message is a bit cryptic:

Error: Could not retrieve catalog from remote server: Error 400 on SERVER: malformed format string - %S at /etc/puppet/manifests/foo.pp:18 on node bar.example.com

This happens during a puppet run, which in my case loads up YAML files. The tricky part was that the error wasn’t at all related to the foo.pp file, it just happened to be the first time hiera was run. So where’s the real error?

$ cd /etc/puppet/hieradata/; for i in `find . -name '*.yaml'`; do echo $i; ruby -e "require 'yaml'; YAML.parse(File.open('$i'))"; done

Run this one liner on your puppetmaster, and hiera should quickly point out which files have errors, and exactly which lines (and columns) they’re on.

Happy hacking,



Automatic hiera lookups in puppet 3.x

Dear readers,

I’ve started the slow migration of code from puppet 2.6 all the way to 3.x+. There were a few things I wasn’t clear on, so hopefully this will help to discuss these and make your migration easier!

I used hiera in 2.6, and I actually like it a lot so far. I was concerned that automatic lookups would pull in values that I wasn’t expecting. This is not the case or a worry. Let’s dive in and let the code speak:

# create a class in a module or site.pp for testing...
class foo(
        $a = 'apple',
        $b = 'banana'
) {
        notify { 'foo':
                message => "a is: ${a}, b is: ${b}",


# define it using :: as a prefix because we want to search in the
# top level, module namespace. optional if we only have one foo.
class { '::foo':


# /etc/puppet/hiera.yaml
        - yaml

        - globals
        - whatever
        - youlike

        :datadir: /etc/puppet/hieradata/


# /etc/puppet/hieradata/whatever.yaml (because of - whatever above)
foo::a: 'somevalue' # this many colons is actually valid syntax
dude: 'sweet'

will produce:

Notice: a is somevalue, b is: banana
Notice: Finished catalog run in 3.14159265359 seconds

This is the automatic lookup. You probably have zero risk of collision with earlier data in your hiera yaml files, because these lookups use keys that match the classname::paramname pattern. If you had used :: (double colons) in your keys before, then you’re insane, and you should check for any collisions! The downside to this is that my whatever.yaml looks awkward with all those colons, but I got over that very quickly.

The full lookup order is first:

# directly specified values first (of course)
class { '::foo':
        a => 'this value is used first if set.',

and then:

# values matching an appropriate yaml key:
foo::a: 'this value is used next if found.'

and finally:

class foo(
        $a = 'this parameter default value is used last.'
        $b = 'b is still for banana...'
) {
        # do stuff...

all as detailed in: http://projects.puppetlabs.com/issues/11608. Finding this link and setting me down the path to knowledge was all thanks to eric0 in #puppet. Thanks Eric!

Make sure to reload your puppetmaster after you make any changes to /etc/puppet/hiera.yaml, and as always:

Happy hacking,