About purpleidea

I am James. Just James.

including a recursive tree of files with distutils

It turns out it is non trivial (afaict) to include a tree of files (a directory) in a python distutils data_files argument. Here’s how I managed to do it, while also allowing the programmer to include manual entries:

NAME = 'project_name'
distutils.core.setup(
# ...
    data_files=[
        ('share/%s' % NAME, ['README']),
        ('share/%s' % NAME, ['files/somefile']),
        ('share/%s/templates' % NAME, [
            'files/templates/template1.tmpl',
            'files/templates/template2.tmpl',
        ]),
    ] + [('share/%s/%s' % (NAME, x[0]), map(lambda y: x[0]+'/'+y, x[2])) for x in os.walk('the_directory/')],
# ...
)

Since data_files is a list, I’ve just appended our specially generated list to the end. You can do this as many times as you wish. The list is a comprehension which builds each tuple as it walks through the requested directory. I’ve chosen a root installation directory of ${prefix}/share/project_name/the_directory/ but you can change this code to match your own specifications.

Strangely, I couldn’t find this solution when searching the Internets, so I had to write it myself. Perhaps my google-fu is weak, and maybe this post needs to get some linkage to help out the rest of us python programmers.

Happy hacking,
James

 

finding your software install $prefix from inside python

Good python software developers tend to use distutils and include a setup.py with their code. The problem I often encounter is finding out which prefix your software has been installed in from within the python code. This might be necessary if you want to interact with some data that you’ve installed into: $prefix/share/projectname/ Here are the various steps:

1) Distutils:

NAME='someproject'
distutils.core.setup(
    name=NAME,
    version='0.1',
    author='James Shubin',
    author_email='purpleidea@gmail.com',
    url='https://ttboj.wordpress.com/',
    description='This is an example project',
    # http://pypi.python.org/pypi?%3Aaction=list_classifiers
    classifiers=[
        'Environment :: Console',
        'Intended Audience :: System Administrators',
        'License :: OSI Approved :: GNU Affero General Public License v3',
        'Operating System :: POSIX :: Linux',
        'Programming Language :: Python',
        'Topic :: Utilities',
    ],
    packages=[NAME],
    package_dir={NAME: 'src'},
    data_files=[
        ('share/%s' % NAME, ['README']),
        ('share/%s' % NAME, ['images/something.png']),
    ],
    scripts=['somebin'],
)

2) Install:

python setup.py install --prefix=~/testprefix/

Note: If you don’t specify a prefix, then this will get installed into your system prefix.

3) Run:

cd ~/testprefix/ # the prefix you chose above
PYTHONPATH=lib/python2.7/site-packages/ ./bin/somebin

Note: If you didn’t specify a prefix above, then you don’t need to set the PYTHONPATH variable, and also, the executable will already be in your default $PATH

4) Prefix:

I have written a small python module which I include in all of my python software. It will returns the projects installed prefix when run. I usually use it like so:

print 'something.png is located at: %s' % os.path.join(prefix.prefix(), 'share', NAME, 'images', 'something.png')

5) Code:

Here is the code for prefix.py. I put this file under my projectname/src/ directory.

#!/usr/bin/python
# -*- coding: utf-8 -*-
"""Find the prefix of the current installation, and other useful variables.

Finding the prefix that your program has been installed in can be non-trivial.
This simplifies the process by allowing you to import <packagename>.prefix and
get instant access to the path prefix by calling the function named: prefix().
If you'd like to join this prefix onto a given path, pass it as the first arg.

Example: if [ `./prefix.py` ]; then echo yes; else echo no; fi
Example: x=`./prefix.py`; echo 'prefix: '$x
"""
# Copyright (C) 2010-2012  James Shubin
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU Affero General Public License for more details.
#
# You should have received a copy of the GNU Affero General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.

__all__ = ('prefix', 'name')
#DEBUG = False

import os
import sys

def prefix(join=None):
    """Returns the prefix that this code was installed into."""
    # constants for this execution
    path = os.path.abspath(__file__)
    #if DEBUG: print 'path: %s' % path
    name = os.path.basename(os.path.dirname(path))
    #if DEBUG: print 'name: %s' % name
    this = os.path.basename(path)
    #if DEBUG: print 'this: %s' % this

    # rule set
    rules = [
        # to match: /usr/lib/python2.5/site-packages/project/prefix.py
        # or: /usr/local/lib/python2.6/dist-packages/project/prefix.py
        lambda x: x == 'lib',
        lambda x: x == ('python%s' % sys.version[:3]),
        lambda x: x in ['site-packages', 'dist-packages'],
        lambda x: x == name,    # 'project'
        lambda x: x == this,    # 'prefix.py'
    ]

    # matching engine
    while len(rules) > 0:
        (path, token) = os.path.split(path)
        #if DEBUG: print 'path: %s, token: %s' % (path, token)
        rule = rules.pop()
        if not rule(token):
            #if DEBUG: print 'rule failed'
            return False

    # usually returns: /usr/ or /usr/local/ (but without slash postfix)
    if join is None:
        return path
    else:
        return os.path.join(path, join)    # add on join if it exists!

def name(pop=[], suffix=None):
    """Returns the name of this particular project. If pop is a list
    containing more than one element, name() will remove those items
    from the path tail before deciding on the project name. If there
    is an element which does not exist in the path tail, then raise.
    If a suffix is specified, then it is removed if found at end."""
    path = os.path.dirname(os.path.abspath(__file__))
    if isinstance(pop, str): pop = [pop]    # force single strings to list
    while len(pop) > 0:
        (path, tail) = os.path.split(path)
        if pop.pop() != tail:
            #if DEBUG: print 'tail: %s' % tail
            raise ValueError('Element doesnʼt match path tail.')

    path = os.path.basename(path)
    if suffix is not None and path.endswith(suffix):
        path = path[0:-len(suffix)]
    return path

if __name__ == '__main__':
    join = None
    if len(sys.argv) > 1:
        join = ' '.join(sys.argv[1:])
    result = prefix(join)
    if result:
        print result
    else:
        sys.exit(1)

Why this sort of thing isn’t built into python boggles my mind, so if for some reason you have a better solution, please let me know. Also, don’t be fooled by the red herring that is: sys.prefix

Happy hacking,
James

How to avoid cluster race conditions or: How to implement a distributed lock manager in puppet

I’ve been working on a puppet module for gluster. Both this, my puppet-gfs2 module, and other puppet clustering modules all share a common problem: How does one make sure that only certain operations happen on one node at a time?

The inelegant solutions are simple:

  1. Specify manually (in puppet) which node the “master” is, and have it carry out all the special operations. Downside: Single point of failure for your distributed cluster, and you’ve also written ugly asymmetrical code. Build a beautiful, decentralized setup instead.
  2. Run all your operations on all nodes. Ensure they’re idempotent, and that they check the cluster state for success first. Downside: Hope that they don’t run simultaneously and race somehow. This is actually how I engineered my first version of puppet-gluster! It was low risk, and I just wanted to get the cluster up; my module was just a tool, not a product.
  3. Use the built-in puppet DLM to coordinate running of these tasks. Downside: Oh wait, puppet can’t do this, you misunderstand the puppet architecture. I too thought this was possible for a short duration. Woops! There is no DLM.

Note: I know this is ironic, since by default puppet requires a master node for coordination, however you can have multiple masters, and if they’re down, puppetd still runs on the clients, it just doesn’t receive new information. (You could also reconfigure your manifests to work around these downsides as they arise, but this takes the point out of puppet: keeping it automatic.)

Mostly elegant: Thoughts of my other cluster crept into my head. Out of nowhere, I realized the solution was: VRRP! You may use a different mechanism if you like, but at the moment I’m using keepalived. Keepalived runs on my gluster pool to provide a VIP for the cluster. This allows my clients to use a highly available IP address to download volume files (mount operation) otherwise, if that particular server were down, they wouldn’t be about to mount. The trick: I tell each node what the expected VIP for the cluster is, and if that IP is present in a facter $ipaddress_, then I let that node execute!

The code is now available, please have a look, and let me know what you think.

Happy hacking,
James

PS: Inelegant modes 1 and 2 are still available. For mode 1, set “vip” in your config to the master node IP address you’d like to use. For mode 2, leave the “vip” value at the empty string default.

PPS: I haven’t had a change to thoroughly test this, so be warned of any dust bunnies you might find.

puppet gluster module now in git

The thoughtful bodepd has been kind enough to help me get my puppet-gluster module off the ground and publicized a bit too. My first few commits have been all clean up to get my initial hacking up to snuff with the puppet style guidelines. Sadly, I love indenting my code with tabs, and this is against the puppet rules :(

I’ll be accepting patches by email, but I’d prefer discussion first, especially since I’ve got a few obvious things brewing in my mental queue that should hit master shortly.

Are you a gluster expert who’s weak at puppet? I’m keen to implement many of the common raid, file system and gluster performance optimization’s directly into the module, so that the out of box experience for new users is a fast, streamlined, experience.

Are you a puppet expert who knows a bit of gluster? I’m not sure what the best way to handle large config changes, such as expanding volumes, or replacing bricks is. I can imagine a large state diagram that would be very hard to wholly implement in puppet. So for now, I’m missing a few edge cases, but hopefully this module will be able to solve more of them over time.

I’ve included an examples/ directory in the repository, to give you an idea of how this works for now. Stay tuned for more commits!

git clone https://github.com/purpleidea/puppet-gluster.git

Happy hacking,
James

a puppet module for gluster

I am an avid cobbler+puppet user. This allows me to rely on my cobbler server and puppet manifests to describe how servers/workstations are setup. I only backup my configs and data, and I regenerate failed machines PRN.

I’ll be publishing my larger cobbler+puppet infrastructure in the future once it’s been cleaned up a bit, but for now I figured I’d post my work-in-progress “puppet-gluster” module, since it seems there’s a real interest.

Warning: there are some serious issues with this module! I’ve used this as an easy way to build out many servers with cobbler+puppet automatically. It’s not necessarily the best long-term solution, and it certainly can’t handle certain scenarios yet, but it is a stepping stone if someone would like to think about writing such a module this way.

For lack of better hosting, it’s now available here: https://dl.dropbox.com/u/48553683/puppet-gluster.tar.bz2 Once I finish cleaning up a bit of cruft, I’ll post my git tree somewhere sane. All of this code is AGPL3+ so share and enjoy!

What’s next? My goal is to find the interested parties and provoke a bit of discussion as to whether this is useful and where to go next. It makes sense to me, that the gluster experts chirp in and add gluster specific optimization’s into this module, so that it’s used as a sort of de-facto documentation on how to set up gluster properly.

I believe that Dan Bode and other gluster developers are already interested in the “puppet module” project, and that work is underway. I spoke to him briefly about collaborating. He is most likely a more experienced puppet user than I, and so I look forward to the community getting a useful puppet-gluster module from somewhere. Maybe even native gluster types?

Happy hacking,
James

 

now syndicated on “planet gluster”

Many thanks to johnmark in #gluster for syndicating my “gluster” tagged blog posts on http://www.gluster.org/blog/

I aim to keep these posts technical and informative, aimed mostly at other sysadmins and gluster users. Please don’t  be shy to comment on my writing style or to let me know if you need more information about a particular subject. If you have any ideas about things you’d like me to write about, let me know and I’ll try to do my best. I like feedback!

Happy hacking,
James

PS: My main blog (https://ttboj.wordpress.com/) will still contain other technical articles not relating to gluster, should that be useful to you.

building intel nic driver (igb) for gluster on centos

I’ve been having some strange networking issues with gluster. “Eco__” from #gluster suggested I try an up to date Intel nic driver. Here are the steps I followed to make that happen. No news yet on if that solved the problem.

Currently my system is using the igb (intel gigabit) driver. To find out what version you are running:

# modinfo -F version igb
3.2.10-k

I found a newer version from the supermicro ftp site. A download and a decompress later, I found an: igb-3.4.7.tar.gz file hiding out. Thanks to the kind people at Intel, this was fairly easy to compile and install. First install some deps:

# yum install /usr/bin/{rpmbuild,gcc,make} kernel-devel

Use rpmbuild to make yourself an rpm:

# rpmbuild -tb igb-3.4.7.tar.gz
[snip]

Your rpm package should appear in rpmbuild/RPMS/
In my case, I added this to my local cobbler repo, and pushed it to all my gluster nodes. You might prefer a simple:

 # yum install igb-3.4.7-1.x86_64.rpm

Please note that I believe it’s important to build this module on the same kind of OS/Hardware that you’re using it for. Since my storage hosts are all identical, this wasn’t a problem.

Happy hacking!
James

EDIT: tru_tru from #gluster pointed out that this module actually exists in elrepo, the wonderful people who also provide drbd modules. I haven’t tested it, but I’m sure it’s excellent.