How I broke (and fixed) my rgmanager service

Rgmanager, clustat and clusvcadm are useful tools in cluster land. I recently built a custom resource which I added to one of my service chains. Upon inspecting clustat, I noticed:

[root@server1 ~]# clustat
Member Status: Quorate

Member Name                             ID   Status
------ ----                             ---- ------
server1                                 1 Online, Local, rgmanager
server2                                 2 Online, rgmanager

Service Name                   Owner (Last)                   State
------- ----                   ----- ------                   -----
service:service-main-server1   (server1)                      failed

Looking at /var/log/messages, I found:

server1 rgmanager: [script] script:shorewall-reload: start of shorewall-reload.sh failed (returned 2)
server1 rgmanager: start on script "shorewall-reload" returned 1 (generic error)

This was peculiar, because my script didn’t have an exit code of 2 anywhere. It was due to a syntax error (woops)! Moving on with the syntax error fixed, I had trouble getting the service going again. In the logs I found these:

server1 rgmanager: #68: Failed to start service:service-main-server1; return value: 1
server1 rgmanager: Stopping service service:service-main-server1
server1 rgmanager: #12: RG service:service-main-server1 failed to stop; intervention required
server1 rgmanager: Service service:service-main-server1 is failed
server1 rgmanager: #13: Service service:service-main-server1 failed to stop cleanly

Running commands like: clusvcadm -e service-main-server1 didn’t help. It turns out that you have to first convince rgmanager that you truly fixed the problem, by first disabling the service. Now you can safely enable it and things should work smoothly:

clusvcadm -d service-main-server1
clusvcadm -e service-main-server1

Hopefully you’ve now got your feet wet with this clustering intro! Remember that you can look in the logs for clues and run clustat -i 1 in a screen session to keep tabs on things.

Happy Hacking,

James

 

recursion in puppet (for no particular reason)

I’m working on some fancy puppet “code”, and I realized recursion could be very useful. I decided to try out a little hack to see if I could get it to work. I’ll jump right into the code:

#!/usr/bin/puppet

define recursion(
    $count
) {
    notify { "count-${count}":
    }
    $minus1 = inline_template('<%= count.to_i - 1 %>')
    if "${minus1}" == '0' {
        notify { 'done counting!':
        }
    } else {
        # recurse
        recursion { "count-${minus1}":
            count => $minus1,
        }
    }
}

# kick it off...
recursion { 'start':
    count => 4,
}

In theory, this should now work because of local variable scopes. Let’s see if we’ll blow up the puppet stack or not…

[james@computer tmp]$ ./rec.pp 
warning: Implicit invocation of 'puppet apply' by passing files (or flags) directly
to 'puppet' is deprecated, and will be removed in the 2.8 series.  Please
invoke 'puppet apply' directly in the future.

notice: count-4
notice: /Stage[main]//Recursion[start]/Notify[count-4]/message: defined 'message' as 'count-4'
notice: count-2
notice: /Stage[main]//Recursion[start]/Recursion[count-3]/Recursion[count-2]/Notify[count-2]/message: defined 'message' as 'count-2'
notice: count-3
notice: /Stage[main]//Recursion[start]/Recursion[count-3]/Notify[count-3]/message: defined 'message' as 'count-3'
notice: count-1
notice: /Stage[main]//Recursion[start]/Recursion[count-3]/Recursion[count-2]/Recursion[count-1]/Notify[count-1]/message: defined 'message' as 'count-1'
notice: done counting!
notice: /Stage[main]//Recursion[start]/Recursion[count-3]/Recursion[count-2]/Recursion[count-1]/Notify[done counting!]/message: defined 'message' as 'done counting!'
notice: Finished catalog run in 0.16 seconds
[james@computer tmp]$

…and amazingly this seems to work! Hopefully this will be useful for some upcoming trickery I have planned, and if not, it was a fun hack.

I decided to see if it could handle larger values, and for my simple tests, it seemed to do okay:

notice: /Stage[main]//Recursion[start]/Recursion[count-99]/Recursion[count-98]/Recursion[count-97]/Recursion[count-96]/Recursion[count-95]/Recursion[count-94]/Recursion[count-93]/Recursion[count-92]/Recursion[count-91]/Recursion[count-90]/Recursion[count-89]/Recursion[count-88]/Recursion[count-87]/Recursion[count-86]/Recursion[count-85]/Recursion[count-84]/Recursion[count-83]/Recursion[count-82]/Recursion[count-81]/Recursion[count-80]/Recursion[count-79]/Recursion[count-78]/Recursion[count-77]/Recursion[count-76]/Recursion[count-75]/Recursion[count-74]/Recursion[count-73]/Recursion[count-72]/Recursion[count-71]/Recursion[count-70]/Recursion[count-69]/Recursion[count-68]/Recursion[count-67]/Recursion[count-66]/Recursion[count-65]/Recursion[count-64]/Recursion[count-63]/Recursion[count-62]/Recursion[count-61]/Recursion[count-60]/Recursion[count-59]/Recursion[count-58]/Recursion[count-57]/Recursion[count-56]/Recursion[count-55]/Recursion[count-54]/Recursion[count-53]/Recursion[count-52]/Recursion[count-51]/Recursion[count-50]/Recursion[count-49]/Recursion[count-48]/Recursion[count-47]/Recursion[count-46]/Recursion[count-45]/Recursion[count-44]/Recursion[count-43]/Recursion[count-42]/Recursion[count-41]/Recursion[count-40]/Recursion[count-39]/Recursion[count-38]/Recursion[count-37]/Recursion[count-36]/Recursion[count-35]/Recursion[count-34]/Recursion[count-33]/Recursion[count-32]/Recursion[count-31]/Recursion[count-30]/Recursion[count-29]/Recursion[count-28]/Recursion[count-27]/Recursion[count-26]/Recursion[count-25]/Recursion[count-24]/Recursion[count-23]/Recursion[count-22]/Recursion[count-21]/Recursion[count-20]/Recursion[count-19]/Recursion[count-18]/Recursion[count-17]/Recursion[count-16]/Recursion[count-15]/Recursion[count-14]/Recursion[count-13]/Recursion[count-12]/Recursion[count-11]/Recursion[count-10]/Recursion[count-9]/Recursion[count-8]/Recursion[count-7]/Recursion[count-6]/Recursion[count-5]/Recursion[count-4]/Recursion[count-3]/Recursion[count-2]/Recursion[count-1]/Notify[done counting!]/message: defined 'message' as 'done counting!'
notice: Finished catalog run in 1.16 seconds

Running this with a count value of 1000 took 132.19 sec according to puppet, but much longer for the process to actually clean up and finish. This made my fan speed up, but at least it didn’t segfault.

Hopefully I’ll have something more useful to show you next time, but until then, keep on imagining and,

Happy hacking!

James

EDIT: A follow up is now available.

continuous display of log files (better tail -f)

All good sysadmins know about using tail -f to follow a log file. I use this all the time to follow /var/log/messages and my gluster logs in particular. Maybe everyone already knows this, but it deserves a PSA: after a certain amount of time (~days) it seems that new messages don’t appear!

What happens by default is that tail -f follows the file descriptor, not the file name, so when your log files get rotated, the file descriptor still points to the (now renamed) file which no longer gets updates any more.

The solution is to get tail to follow the file name you’re interested in:

tail --follow=name /var/log/messages

EDIT: Fortunately there is a shorter way of running this too, you can use:

tail -F

on any up to date version of tail to get the same result. This adds in –retry to the –folllow=name argument.

Happy hacking!

James

 

setting timed events in puppet

I’ve tried to push puppet to its limits, and so far I’ve succeeded. When you hit the kind of bug that forces you to hack around it, you know you are close. In any case, this isn’t about that embarrassing bug, it’s about how to set delayed actions in puppet.

Enter puppet-runonce, a module that I’ve just finished writing. It starts off with the realization that you can exec an action which also writes to a file. If it sees this file, then it knows that it has already completed, and shouldn’t run itself again. The relevant parts are here:

define runonce::exec(
    $command = '/bin/true',
    $notify = undef,
    $repeat_on_failure = true
) {
    include runonce::exec::base

    $date = "/bin/date >> /var/lib/puppet/tmp/runonce/exec/${name}"
    $valid_command = $repeat_on_failure ? {
        false => "${date} && ${command}",
        default => "${command} && ${date}",
    }

    exec { "runonce-exec-${name}":
        command => "${valid_command}",
        creates => "/var/lib/puppet/tmp/runonce/exec/${name}",    # run once
        notify => $notify,
        # TODO: add any other parameters here that users wants such as cwd and environment...
        require => File['/var/lib/puppet/tmp/runonce/exec/'],
    }
}

This depends on having an isolated namespace per module. I need this in many of my modules, and I have chosen: “/var/lib/puppet/tmp/$modulename“. I’ve added the extra feature that this object can repeatedly run until the $command succeeds or it can run once, and ignore the exit status.

Building a timer is slightly trickier, but follows from the first concept. First create a runonce object which when used, creates a file with a timestamp of “now”. Next, create a new exec object which periodically checks the time, and once we’re past a certain delta, exec the desired command. That looks something like this:

# when this is first run by puppet, a "timestamp" matching the system clock is
# saved. every time puppet runs (usually every 30 minutes) it compares the
# timestamp to the current time, and if this difference exceeds that of the
# set delta, then the requested command is executed.
define runonce::timer(
    $command = '/bin/true',
    $delta = 3600,                # seconds to wait...
    $notify = undef,
    $repeat_on_failure = true
) {
    include runonce::timer::base

    # start the timer...
    exec { "/bin/date > /var/lib/puppet/tmp/runonce/start/${name}":
        creates => "/var/lib/puppet/tmp/runonce/start/${name}",    # run once
        notify => Exec["runonce-timer-${name}"],
        require => File['/var/lib/puppet/tmp/runonce/start/'],
        alias => "runonce-start-${name}",
    }

    $date = "/bin/date >> /var/lib/puppet/tmp/runonce/timer/${name}"
    $valid_command = $repeat_on_failure ? {
        false => "${date} && ${command}",
        default => "${command} && ${date}",
    }

    # end the timer and run command (or vice-versa)
    exec { "runonce-timer-${name}":
        command => "${valid_command}",
        creates => "/var/lib/puppet/tmp/runonce/timer/${name}",    # run once
        # NOTE: run if the difference between the current date and the
        # saved date (both converted to sec) is greater than the delta
        onlyif => "/usr/bin/test -e /var/lib/puppet/tmp/runonce/start/${name} && /usr/bin/test \$(( `/bin/date +%s` - `/usr/bin/head -n 1 /var/lib/puppet/tmp/runonce/start/${name} | /bin/date --file=- +%s` )) -gt ${delta}",
        notify => $notify,
        require => [
            File['/var/lib/puppet/tmp/runonce/timer/'],
            Exec["runonce-start-${name}"],
        ],
        # TODO: add any other parameters here that users wants such as cwd and environment...
    }
}

The real “magic” is in the power of bash, and its individual elegant pieces. The `date` command makes it easy to import a previous stored value with –file, and a bit of conversion glue and mathematics gives us:

/usr/bin/test -e ${startdatefile} && /usr/bin/test $(( `/bin/date +%s` - `/usr/bin/head -n 1 ${startdatefile} | /bin/date --file=- +%s` )) -gt ${deltaseconds}

It’s a big mouthful to digest on one line, however it’s probably write only code anyways, and isn’t really that complicated anyhow. One downside is that this is only evaluated every time puppet runs, so in other words it has the approximate granularity of 30 minutes. If you’re using this for anything precise, then you’re insane!

Speaking of sanity, why would anyone want such a thing? My use case is simple: I’m writing a fancy puppet-drbd module, to help me auto-deploy clusters. I always have to manually turn up the initial sync rate to get my cluster happy, but this should be reverted for normal use. The solution is to set an initial sync rate with runonce::exec, and revert it 24 hours later with runonce::timer!

Both this module and my drbd module will be released in the near future. All of this code is AGPLv3+ so please share and enjoy with those freedoms.

Happy hacking,
James

preventing duplicate parameter values in puppet types

I am writing a keepalived module for puppet. It will naturally be called: “puppet-keepalived”, and I will be releasing the code in the near future! In any case, if you’re familiar with VRRP, you’ll know that each managed link (eg: resource or VIP) has a common routerid and password which are shared among all members in the group. It is important that these parameters are unique across the type definitions on a single node.

Here is an example of two different instance definitions in puppet:

keepalived::vrrp { 'VI_NET':
    state => ...,
    routerid => 42, # must be unique
    password => 'somelongpassword...',
}

keepalived::vrrp { 'VI_DMZ':
    state => ...,
    routerid => 43, # must be unique
    password => 'somedifferentlongpassword...',
}

Here puppet guarantees that the $name variable is unique. Let’s extend this magic with a trick to make sure that routerid and password are too. Here is an excerpt from the relevant puppet definition:

define keepalived::vrrp(
    $state,
    ...
    $routerid,
    $password
) {
    ...
    file { "/etc/keepalived/${instance}.vrrp":
        content => template('keepalived/keepalived.vrrp.erb'),
        ...
        ensure => present,
        # NOTE: add unnecessary alias names so that if one of those
        # variables appears more than once, an error will be raised.
        alias => ["password-${password}", "routerid-${routerid}"],
    }
    ...
}

As you can see, multiple alias names are specified with an array, and since this file definition is used for each keepalived::vrrp instance, you’ll most assuredly cause a “duplicate alias” issue if there is a duplicate routerid or password used!

This trick will also probably work across define types too. To ensure a common key, just create an object like:

file { '/root/the_unique_key':
    alias => ["token1-${token1}", "token2-${token2}", "token3-${token3}"],
}

The token prefix will guarantee that you don’t accidentally cause a collision between dissimilar parameter values, unless that’s what you want. I’ve used a file in this scenario, but you can use whatever object you like. Because of this reason, it would make sense to create a noop() type if you’re really serious about this. Maybe puppet labs can add a built-in type upstream.

This is the type of thing that’s important to do if you want to write puppet code that acts less like a templating hack and more like a library :)

Happy hacking!

James

including a recursive tree of files with distutils

It turns out it is non trivial (afaict) to include a tree of files (a directory) in a python distutils data_files argument. Here’s how I managed to do it, while also allowing the programmer to include manual entries:

NAME = 'project_name'
distutils.core.setup(
# ...
    data_files=[
        ('share/%s' % NAME, ['README']),
        ('share/%s' % NAME, ['files/somefile']),
        ('share/%s/templates' % NAME, [
            'files/templates/template1.tmpl',
            'files/templates/template2.tmpl',
        ]),
    ] + [('share/%s/%s' % (NAME, x[0]), map(lambda y: x[0]+'/'+y, x[2])) for x in os.walk('the_directory/')],
# ...
)

Since data_files is a list, I’ve just appended our specially generated list to the end. You can do this as many times as you wish. The list is a comprehension which builds each tuple as it walks through the requested directory. I’ve chosen a root installation directory of ${prefix}/share/project_name/the_directory/ but you can change this code to match your own specifications.

Strangely, I couldn’t find this solution when searching the Internets, so I had to write it myself. Perhaps my google-fu is weak, and maybe this post needs to get some linkage to help out the rest of us python programmers.

Happy hacking,
James

 

finding your software install $prefix from inside python

Good python software developers tend to use distutils and include a setup.py with their code. The problem I often encounter is finding out which prefix your software has been installed in from within the python code. This might be necessary if you want to interact with some data that you’ve installed into: $prefix/share/projectname/ Here are the various steps:

1) Distutils:

NAME='someproject'
distutils.core.setup(
    name=NAME,
    version='0.1',
    author='James Shubin',
    author_email='purpleidea@gmail.com',
    url='https://ttboj.wordpress.com/',
    description='This is an example project',
    # http://pypi.python.org/pypi?%3Aaction=list_classifiers
    classifiers=[
        'Environment :: Console',
        'Intended Audience :: System Administrators',
        'License :: OSI Approved :: GNU Affero General Public License v3',
        'Operating System :: POSIX :: Linux',
        'Programming Language :: Python',
        'Topic :: Utilities',
    ],
    packages=[NAME],
    package_dir={NAME: 'src'},
    data_files=[
        ('share/%s' % NAME, ['README']),
        ('share/%s' % NAME, ['images/something.png']),
    ],
    scripts=['somebin'],
)

2) Install:

python setup.py install --prefix=~/testprefix/

Note: If you don’t specify a prefix, then this will get installed into your system prefix.

3) Run:

cd ~/testprefix/ # the prefix you chose above
PYTHONPATH=lib/python2.7/site-packages/ ./bin/somebin

Note: If you didn’t specify a prefix above, then you don’t need to set the PYTHONPATH variable, and also, the executable will already be in your default $PATH

4) Prefix:

I have written a small python module which I include in all of my python software. It will returns the projects installed prefix when run. I usually use it like so:

print 'something.png is located at: %s' % os.path.join(prefix.prefix(), 'share', NAME, 'images', 'something.png')

5) Code:

Here is the code for prefix.py. I put this file under my projectname/src/ directory.

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Find the prefix of the current installation, and other useful variables.

Finding the prefix that your program has been installed in can be non-trivial.
This simplifies the process by allowing you to import <packagename>.prefix and
get instant access to the path prefix by calling the function named: prefix().
If you'd like to join this prefix onto a given path, pass it as the first arg.

Example: if [ `./prefix.py` ]; then echo yes; else echo no; fi
Example: x=`./prefix.py`; echo 'prefix: '$x
"""
# Copyright (C) 2010-2012  James Shubin
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

__all__ = ('prefix', 'name')
#DEBUG = False

import os
import sys

def prefix(join=None):
    """Returns the prefix that this code was installed into."""
    # constants for this execution
    path = os.path.abspath(__file__)
    #if DEBUG: print 'path: %s' % path
    name = os.path.basename(os.path.dirname(path))
    #if DEBUG: print 'name: %s' % name
    this = os.path.basename(path)
    #if DEBUG: print 'this: %s' % this

    # rule set
    rules = [
        # to match: /usr/lib/python2.5/site-packages/project/prefix.py
        # or: /usr/local/lib/python2.6/dist-packages/project/prefix.py
        lambda x: x == 'lib',
        lambda x: x == ('python%s' % sys.version[:3]),
        lambda x: x in ['site-packages', 'dist-packages'],
        lambda x: x == name,    # 'project'
        lambda x: x == this,    # 'prefix.py'
    ]

    # matching engine
    while len(rules) > 0:
        (path, token) = os.path.split(path)
        #if DEBUG: print 'path: %s, token: %s' % (path, token)
        rule = rules.pop()
        if not rule(token):
            #if DEBUG: print 'rule failed'
            return False

    # usually returns: /usr/ or /usr/local/ (but without slash postfix)
    if join is None:
        return path
    else:
        return os.path.join(path, join)    # add on join if it exists!

def name(pop=[], suffix=None):
    """Returns the name of this particular project. If pop is a list
    containing more than one element, name() will remove those items
    from the path tail before deciding on the project name. If there
    is an element which does not exist in the path tail, then raise.
    If a suffix is specified, then it is removed if found at end."""
    path = os.path.dirname(os.path.abspath(__file__))
    if isinstance(pop, str): pop = [pop]    # force single strings to list
    while len(pop) > 0:
        (path, tail) = os.path.split(path)
        if pop.pop() != tail:
            #if DEBUG: print 'tail: %s' % tail
            raise ValueError('Element doesnʼt match path tail.')

    path = os.path.basename(path)
    if suffix is not None and path.endswith(suffix):
        path = path[0:-len(suffix)]
    return path

if __name__ == '__main__':
    join = None
    if len(sys.argv) > 1:
        join = ' '.join(sys.argv[1:])
    result = prefix(join)
    if result:
        print result
    else:
        sys.exit(1)

Why this sort of thing isn’t built into python boggles my mind, so if for some reason you have a better solution, please let me know. Also, don’t be fooled by the red herring that is: sys.prefix

Happy hacking,
James