Collecting duplicate resources in puppet

I could probably write a long design article explaining why identical duplicate resources should be allowed [1] in puppet. If puppet is going to survive in the long-term, they will have to build in this feature. In the short-term, I will have to hack around deficiency. As luck would have it, Mr. Bode has already written part one of the hack: ensure_resource.

Why?

Suppose you have a given infrastructure with N vaguely identical nodes. N could equal 2 for a dual primary or active-passive cluster, or N could be greater than 2 for a more elaborate N-ary cluster. It is sufficient to say, that each of those N nodes might export an identical puppet resource which one (or many) clients might need to collect, to operate correctly. It’s important that each node export this, so that there is no single point of failure if one or more of the cluster nodes goes missing.

How?

As I mentioned, ensure_resources is a good enough hack to start. Here’s how you take an existing resource, and make it duplicate friendly. Take for example, the bulk of my dhcp::subnet resource:

define dhcp::subnet(
      $subnet,
      # [...]
      $range = [],
      $allow_duplicates = false
) {
      if $allow_duplicates { # a non empty string is also a true
            # allow the user to specify a specific split string to use...
            $c = type($allow_duplicates) ? {
                  'string' => "${allow_duplicates}",
                  default => '#',
            }
            if "${c}" == '' {
                  fail('Split character(s) cannot be empty!')
            }

            # split into $realname-$uid where $realname can contain split chars
            $realname = inline_template("<%= name.rindex('${c}').nil?? name : name.slice(0, name.rindex('${c}')) %>")
            $uid = inline_template("<%= name.rindex('${c}').nil?? '' : name.slice(name.rindex('${c}')+'${c}'.length, name.length-name.rindex('${c}')-'${c}'.length) %>")

            $params = { # this must use all the args as listed above...
                  'subnet' => $subnet,
                  # [...]
                  'range' => $range,
                  # NOTE: don't include the allow_duplicates flag...
            }

            ensure_resource('dhcp::subnet', "${realname}", $params)
      } else { # body of the actual resource...

            # BUG: lol: https://projects.puppetlabs.com/issues/15813
            $valid_range = type($range) ? {
                  'array' => $range,
                  default => [$range],
            }

            # the templating part of the module... 
            frag { "/etc/dhcp/subnets.d/${name}.subnet.frag":
                  content => template('dhcp/subnet.frag.erb'),
            }
      }
}

As you can see, I added an $allow_duplicates parameter to my resource. If it is set to true, then when the resource is defined, it parses out a trailing #comment from the $namevar. This can guarantee uniqueness for the $name (if they happen to be on the same node) but more importantly, it can guarantee uniqueness on a collector, where you will otherwise be unable to workaround the $name collision.

This is how you use this on one of the exporting nodes:

@@dhcp::subnet { "dmz#${hostname}":
    subnet => ...,
      range => [...],
      allow_duplicates => '#',
}

and on the collector:

Dhcp::Subnet <<| tag == 'dhcp' and title != "${dhcp_zone}" |>> {
}

There are a few things to notice:

  1. The $allow_duplicates argument can be set to true (a boolean), or to any string. If you pick a string, then that will be used to “split” out the end comment. It’s smart enough to split with a reverse index search so that your name can contain the #’s if need be. By default it looks for a single #, but you could replace this with ‘XXX123HACK‘ if that was the only unique string match you can manage. Make sure not to use the string value of ‘true‘.
  2. On my collector I like to filter by title. This is the $namevar. Sadly, this doesn’t support any fancier matching like in_array or startswith. I consider this a puppet deficiency. Hopefully someone will fix this to allow general puppet code here.
  3. Adding this to each resource is kind of annoying. It’s obviously a hack, but it’s the right thing to do for the time being IMHO.

Hope you had fun with this.

Happy hacking,

James

PS: [1] One side note, in the general case for custom resources, I actually think that by default duplicate parameters should be required, but that a resource could provide an optional function such as is_matched which would take as input the two parameter hash trees, and decide if they’re “functionally equivalent”. This would let an individual resource decide if it matters that you specified thing=>yes in one and thing=>true in the other. Functionally it matters that duplicate resources don’t have conflicting effects. I’m sure this would be particularly bug prone, and probably cause thrashing in some cases, which is why, by default the parameters should all match. </babble>

8 thoughts on “Collecting duplicate resources in puppet

  1. Hi, thanks for the interesting Blog.

    Can you explain this more?
    “It is sufficient to say, that each of those N nodes might export an identical puppet resource which one (or many) clients might need to collect, to operate correctly.”

    I read the entire blog, and have a vague understanding of what you are doing, but I still don’t really understand why. By, “export a puppet resource,” do you simply mean declare it on a node? What is the difference between a client and a node in this context?

    Thanks,
    Jason

    • Hi Jason,

      Sorry that it was confusing to you. If you’re not 100% comfortable with exported resources, you should probably get more familiar with these first.

      Explained another way:

      Clusters exist that have 2 or more relatively identical members. This means, there might be two or more physical machines that all do the same sort of thing. This is useful because if one of those computers explodes, the service that that cluster provided might still be able to continue. This is called high availability. There are other things that clusters can provide, but we’ll forget about these things for now.

      One useful trick puppet can do is define a resource locally. Everyone knows this already. A more useful and more complicated thing it can do is have computer a define an “exported resource”. This builds a resource, but doesn’t actually build/”run” it on the machine where it was made. Other machines then usually look for these “exported resources” and “collect” them, which is to say, filter through what’s available (what has already been defined somewhere) and then “run” them locally to use the actual resource.

      Suppose the group of N members all need to have a particular resource defined on a single dhcp server. We do this by using exported resources. However, since the resource would be identical regardless of which of those N members exported it, once we collect on the dhcp server, we might get a duplicate definition issue! This is a problem for the dhcp server, but essential for the cluster members. The reason it’s essential is that we can never guarantee which particular nodes in the cluster will be available to export the resource in question. It could be that one of them is on fire, and if it was the only one exporting the resource, then it won’t get defined on dhcp, and the remaining working members of the cluster won’t have the resource on dhcp that they need.

      This article explains how to work around this design error in puppet.

      Hope that helps, let me know if you have more specific questions.

  2. Feel like I’m missing a key point here, help me out.

    An exported resource is just an entry in a storeconfigs database. If a particular node goes down *after* having already successfully exported the resource in question, I’m not sure how you lose access to that resource. I don’t think Puppet checks if the source host of an exported resource is up before allowing you to collect it.

    More worrying to me are:

    1. Whether I’m inadvertently collecting old resources from nodes that no longer exist.
    2. Whether my storeconfigs DB is highly available — because if it isn’t, then exported resources will just cease to function.

    Am I off track here?

    • Great comment!

      An exported resource is just an entry in a storeconfigs database. If a particular node goes down *after* having already successfully exported the resource in question, I’m not sure how you lose access to that resource. I don’t think Puppet checks if the source host of an exported resource is up before allowing you to collect it.

      You’re right about this, except for the fact that:
      1) You necessarily will at least have that node up to begin with
      2) That you want to make a change while one of your nodes is down

      If the single node that is either down, or was never up is the only one exporting the resource, then we have a problem. Also, the lack of symmetry wouldn’t be very elegant.
      Does this answer what you’re getting at?

      For your second two points, you’re right on the money:
      @1: Very true, and it’s a concern you have to weigh. In my case, it’s not a problem, because it’s not a “danger” to have an extra subnet kicking about. When I wrote this code, I actually wrote myself a comment to look into see if some sort of TTL parameter could exist for exported resources. Maybe it makes sense to do some sort of housekeeping in a cron job on the puppetmaster? On the downside, this could break a lot of things. I’m not sure what the best solution is, but it’s a very very small corner case for now. In the worst case, it’s a nuisance. That nuisance would only crop up when something is _down_ *and* you’re making major network changes. For that, 1% or 1%, there’s always manual intervention. What’s your solution?

      @2: This is true, however AFAICT, if it’s down, then puppet will still run locally using whatever cached copy it got last. If somehow it registers as being empty, and sends this off to clients, then this is bad. I’ve never had this second problem, not sure if it even exists. My puppetmaster _is_ highly available though, and there are methods for setting up dual primary puppetmasters I believe (although I’ve never tried this).

      Cheers
      James

  3. I am new for puppet. Can you please help me by giving puppet step by step documentation.
    Thanks in advance.

  4. About [1] and default behavior of exported resources. Do we have bug # on puppet labs?

    On one side your workaround is perfect and on another side it sucks cause it’s not implemented by default.

    Cheers

    • Agreed, I don’t have an open bug about this that I can remember. If you open one, please let me know about it here!

      I’ve definitely mentioned this in conversation to a number of puppet folks, but in general they were too busy with other issues IIRC.

Leave a comment