Fancy git aliases and git cherryfetch

Here are two quick git tricks that I’ve added to my toolbox lately…

I wanted to create a git alias that takes in argv from the command, but in the middle of the command. Here’s the hack that I came up with for the [alias] section of my ~/.gitconfig:

[alias]

    # cherryfetch fetches a repo ($1) / branch ($2) and applies it rebased!
    # the && true at the end eats up the appended args
    cherryfetch = !git fetch "$1" "$2" && git cherry-pick HEAD..FETCH_HEAD && true

Explanation:

The git alias is called “cherryfetch“. It starts with an exclamation point (!) which means it’s a command to execute. It uses $1 and $2 as arguments, which thankfully exist in this environment! Because running a git alias causes the arguments to get appended to the command, the && true part exists at the end to receive those commands so that they are silently ignored! This works because the command:

$ true ignored command line arguments

Does absolutely nothing but return success. Better naming suggestions for my alias are welcome!

What does this particular alias do?

Too often I receive a patch which has not been rebased against *my* current git master. This happens because the patch author started hacking on the feature against current git master, but git master grew new commits before the patch was done. The best solution is to ask the author to rebase against git master and resubmit, but when I’m feeling helpful, I can now use git cherryfetch to do the same thing! When their patch is unrelated to my changes, this works perfectly.

Example:

$ git cherryfetch https://github.com/example_user/puppet-gluster.git feat/branch_name
From https://github.com/example_user/puppet-gluster
* branch            feat/branch_name -> FETCH_HEAD
[master abcdef01] Pi day is awesome, but Tau day is better! purpleidea/puppet-gluster#feat/branch_name
Author: Example User <user@example.com>
Date: Tue Mar 14 09:26:53 2015 -0400
4 file changed, 42 insertions(+), 13 deletions(-)

Manual learning:

If you were to do the same thing manually, this operation is the equivalent of checking out a branch at the same commit this patch applies to, pulling the patch down (causing a fast-forward) running a git rebase master, switching to master and merging your newly rebased branch in.

My version is faster, isn’t it?

Happy hacking!

James

Update: I’ve updated the alias so that it works with N commits. Thanks to David Schmitt for pointing it out, and Felix Frank for finding a nicer solution!

Building RHEL Vagrant Boxes with Vagrant-Builder

Vagrant is a great tool for development, but Red Hat Enterprise Linux (RHEL) customers have typically been left out, because it has been impossible to get RHEL boxes! It would be extremely elegant if hackers could quickly test and prototype their code on the same OS as they’re running in production.

Secondly, when hacking on projects that have a long initial setup phase (eg: a long rpm install) it would be excellent if hackers could roll their own modified base boxes, so that certain common operations could be re-factored out into the base image.

This all changes today.

Please continue reading if you’d like to know how :)

Subscriptions:

In order to use RHEL, you first need a subscription. If you don’t already have one, go sign up… I’ll wait. You do have to pay money, but in return, you’re funding my salary (and many others) so that we can build you lots of great hacks.

Prerequisites:

I’ll be working through this whole process on a Fedora 21 laptop. It should probably work on different OS versions and flavours, but I haven’t tested it. Please test, and let me know your results! You’ll also need virt-install and virt-builder installed:

$ sudo yum install -y /usr/bin/virt-{install,builder}

Step one:

Login to https://access.redhat.com/ and check that you have a valid subscription available. This should look like this:

A view of my available subscriptions.

A view of my available subscriptions.

If everything looks good, you’ll need to download an ISO image of RHEL. First head to the downloads section and find the RHEL product:

A view of my available product downloads.

A view of my available product downloads.

In the RHEL download section, you’ll find a number of variants. You want the RHEL 7.0 Binary DVD:

A view of the available RHEL downloads.

A view of the available RHEL downloads.

After it has finished downloading, verify the SHA-256 hash is correct, and continue to step two!

$ sha256sum rhel-server-7.0-x86_64-dvd.iso
85a9fedc2bf0fc825cc7817056aa00b3ea87d7e111e0cf8de77d3ba643f8646c  rhel-server-7.0-x86_64-dvd.iso

Step two:

Grab a copy of vagrant-builder:

$ git clone https://github.com/purpleidea/vagrant-builder
Cloning into 'vagrant-builder'...
[...]
Checking connectivity... done.

I’m pleased to announce that it now has some documentation! (Patches are welcome to improve it!)

Since we’re going to use it to build RHEL images, you’ll need to put your subscription manager credentials in ~/.vagrant-builder/auth.sh:

$ cat ~/.vagrant-builder/auth.sh
# these values are used by vagrant-builder
USERNAME='purpleidea@redhat.com' # replace with your access.redhat.com username
PASSWORD='hunter2'               # replace with your access.redhat.com password

This is a simple shell script that gets sourced, so you could instead replace the static values with a script that calls out to the GNOME Keyring. This is left as an exercise to the reader.

To build the image, we’ll be working in the v7/ directory. This directory supports common OS families and versions that have high commonality, and this includes Fedora 20, Fedora 21, CentOS 7.0, and RHEL 7.0.

Put the downloaded RHEL ISO in the iso/ directory. To allow qemu to see this file, you’ll need to add some acl’s:

$ sudo -s # do this as root
$ cd /home/
$ getfacl james # james is my home directory
# file: james
# owner: james
# group: james
user::rwx
group::---
other::---
$ setfacl -m u:qemu:r-x james # this is the important line
$ getfacl james
# file: james
# owner: james
# group: james
user::rwx
user:qemu:r-x
group::---
mask::r-x
other::---

If you have an unusually small /tmp directory, it might also be an issue. You’ll need at least 6GiB free, but a bit extra is a good idea. Check your free space first:

$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 1.9G 1.3M 1.9G 1% /tmp

Let’s increase this a bit:

$ sudo mount -o remount,size=8G /tmp
$ df -h /tmp
Filesystem Size Used Avail Use% Mounted on
tmpfs 8.0G 1.3M 8.0G 1% /tmp

You’re now ready to build an image…

Step three:

In the versions/ directory, you’ll see that I have provided a rhel-7.0-iso.sh script. You’ll need to run it from its parent directory. This will take a while, and will cause two sudo prompts, which are required for virt-install. One downside to this process is that your https://access.redhat.com/ password will be briefly shown in the virt-builder output. Patches to fix this are welcome!

$ pwd
/home/james/code/vagrant-builder/v7
$ time ./versions/rhel-7.0-iso.sh
[...]
real    38m49.777s
user    13m20.910s
sys     1m13.832s
$ echo $?
0

With any luck, this should eventually complete successfully. This uses your cpu’s virtualization instructions, so if they’re not enabled, this will be a lot slower. It also uses the network, which in North America, means you’re in for a wait. Lastly, the xz compression utility will use a bunch of cpu building the virt-builder base image. On my laptop, this whole process took about 30 minutes. The above run was done without an SSD and took a bit longer.

The good news is that most of hard work is now done and won’t need to be repeated! If you want to see the fruits of your CPU labour, have a look in: ~/tmp/builder/rhel-7.0-iso/.

$ ls -lAhGs
total 4.1G
1.7G -rw-r--r--. 1 james 1.7G Feb 23 18:48 box.img
1.7G -rw-r--r--. 1 james  41G Feb 23 18:48 builder.img
 12K -rw-rw-r--. 1 james  10K Feb 23 18:11 docker.tar
4.0K -rw-rw-r--. 1 james  388 Feb 23 18:39 index
4.0K -rw-rw-r--. 1 james   64 Feb 23 18:11 metadata.json
652M -rw-rw-r--. 1 james 652M Feb 23 18:50 rhel-7.0-iso.box
200M -rw-r--r--. 1 james 200M Feb 23 18:28 rhel-7.0.xz

As you can see, we’ve produced a bunch of files. The rhel-7.0-iso.box is your RHEL 7.0 vagrant base box! Congratulations!

Step four:

If you haven’t ever installed vagrant, you’ll pleased to know that as of last week, vagrant and vagrant-libvirt RPM’s have hit Fedora 21! I started trying to convince the RPM wizards about a year ago, and we finally have something that is quite usable! Hopefully we’ll iterate on any packaging bugs, and keep this great work going! There are now only three things you need to do to get a working vagrant-libvirt setup on Fedora 21:

  1. $ yum install -y vagrant-libvirt
  2. Source this .bashrc add-on from: https://gist.github.com/purpleidea/8071962
  3. Add a vagrant.pkla file as mentioned here

Now that we’re now in well-known vagrant territory. Adding the box into vagrant is a simple:

$ vagrant box add rhel-7.0-iso.box --name rhel-7.0

Using the box effectively:

Having a base box is great, but having to manage subscription-manager manually isn’t fun in a DevOps environment. Enter Oh-My-Vagrant (omv). You can use omv to automatically register and unregister boxes! Edit the omv.yaml file so that the image variable refers to the base box you just built, enter your https://access.redhat.com/ username and password, and vagrant up away!

$ cat omv.yaml 
---
:domain: example.com
:network: 192.168.123.0/24
:image: rhel-7.0
:boxurlprefix: ''
:sync: rsync
:folder: ''
:extern: []
:puppet: false
:classes: []
:docker: false
:cachier: false
:vms: []
:namespace: omv
:count: 2
:username: 'purpleidea@redhat.com'
:password: 'hunter2'
:poolid: true
:repos: []
$ vs
Current machine states:

omv1                      not created (libvirt)
omv2                      not created (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

You might want to set repos to be:

['rhel-7-server-rpms', 'rhel-7-server-extras-rpms']

but it depends on what subscriptions you want or have available. If you’d like to store your credentials in an external file, you can do so like this:

$ cat ~/.config/oh-my-vagrant/auth.yaml
---
:username: purpleidea@redhat.com
:password: hunter2

Here’s an actual run to see the subscription-manager action:

$ vup omv1
[...]
==> omv1: The system has been registered with ID: 00112233-4455-6677-8899-aabbccddeeff
==> omv1: Installed Product Current Status:
==> omv1: Product Name: Red Hat Enterprise Linux Server
==> omv1: Status:       Subscribed
$ # the above lines shows that your machine has been registered
$ vscreen root@omv1
[root@omv1 ~]# echo thanks purpleidea!
thanks purpleidea!
[root@omv1 ~]# exit

Make sure to unregister when you are permanently done with a machine, otherwise your subscriptions will be left idle. This happens automatically on vagrant destroy when using Oh-My-Vagrant:

$ vdestroy omv1 # make sure to unregister when you are done
Unlocking shell provisioning for: omv1...
Running 'subscription-manager unregister' on: omv1...
Connection to 192.168.121.26 closed.
System has been unregistered.
==> omv1: Removing domain...

Idempotence:

One interesting aspect of this build process, is that it’s mostly idempotent. It’s able to do this, because it uses GNU Make to ensure that only out of date steps or missing targets are run. As a result, if the build process fails part way through, you’ll only have to repeat the failed steps! This speeds up debugging and iterative development enormously!

To prove this to you, here is what a second run looks like (after the first successful run):

$ time ./versions/rhel-7.0-iso.sh 

real    0m0.029s
user    0m0.013s
sys    0m0.017s

As you can see it completes almost instantly.

Variants:

To build a variant of the base image that we just built, create a versions/*.sh file, and modify the variables to add your changes in. If you start with a copy of the ~/tmp/builder/${VERSION}-${POSTFIX} folder, then you shouldn’t need to repeat the initial steps. Hint: btrfs is excellent at reflinking data, so you don’t unnecessarily store multiple copies!

Plumbing Pipeline:

What actually happens behind the scenes? Most of the magic happens in the Makefile. The relevant series of transforms is as follows:

  1. virt-install: install from iso
  2. virt-sysprep: remove unneeded junk
  3. virt-sparsify: make sparse
  4. xz –best: compress into builder format
  5. virt-builder: use builder to bake vagrant box
  6. qemu-img convert: convert to correct format
  7. tar -cvz: tar up into vagrant box format

There are some intermediate dependency steps that I didn’t mention, so feel free to explore the source.

Future work:

  • Some of the above steps in the pipeline are actually bundled under the same target. It’s not a huge issue, but it could be changed if someone feels strongly about it.
  • Virt-builder can’t run docker commands during build. This would be very useful for pre-populating images with docker containers.
  • Oh-My-Vagrant, needs to have its DNS management switched to use vagrant-hostmanager instead of puppet resource commands.

Disclaimers:

While I expect you’ll love using these RHEL base boxes with Vagrant, the above builder methodology is currently not officially supported, and I can’t guarantee that the RHEL vagrant dev environments will be either. I’m putting this out there for the early (DevOps) adopters who want to hack on this and who didn’t want to invent their own build tool chain. If you do have issues, please leave a comment here, or submit a vagrant-builder issue.

Thanks:

Special thanks to Richard WM Jones and Pino Toscano for their great work on virt-builder that this is based on. Additional thanks to Randy Barlow for encouraging me to work on this. Thanks to Red Hat for continuing to pay my salary :)

Subscriptions?

If I’ve convinced you that you want some RHEL subscriptions, please go have a look, and please let Red Hat know that you appreciated this post and my work.

Happy Hacking!

James

UPDATE: I’ve tested that this process also works with the new RHEL 7.1 release!
UPDATE: I’ve tested that this process also works with the new RHEL 7.2 release!

Introducing: Silent counter

You might want to write code that can tell how many iterations have passed since some action occurred. Alternatively, you might want to know if it’s the first time a machine has run Puppet. To do these types of things, you might wish to have a monotonically increasing counter in your Puppet manifest. Since one did not exist, I set out to build one!

The code:

If you just want to try the code, and skip the ramble, you can include common::counter into your manifest. The entire class is part of my puppet-common module:

git clone https://github.com/purpleidea/puppet-common

Usage:

Usage notes are hardly even necessary. Here is how the code is commonly used:


include ::common::counter    # that's it!

# NOTE: we only see the notify message. no other exec/change is shown!
notify { 'counter':
        message => "Value is: ${::common_counter_simple}",
}

Now use the fact anywhere you need a counter!

Increasing a variable:

At the heart of any counter, you’ll need to have a value store, a way to read the value, and a way to increment the value. For simplicity, the value store we’ll use will be a file on disk. This file is located by default at ${vardir}/common/counter/simple. To read the value, we use a puppet fact. The fact has a key name of $::common_counter_simple. To increment the value, a simple python script is used.

Noise:

To cause an increment of the value on each puppet run, an exec would have to be used. The downside of this is that this causes noise in your puppet logs, even if nothing else is happening! This noise is unwanted, so we work around this with the following code:


exec { 'counter':
        # echo an error message and be false, if the incrementing fails
        command => '/bin/echo "common::counter failed" && /bin/false',
        unless => "${vardir}/counter/increment.py '${vardir}/counter/simple'",
        logoutput => on_failure,
}

As you can see, we cause the run to happen in the silent “unless” part of the exec, and we don’t actually allow any exec to occur unless there is an error running the increment.py!

Complex example:

If you want to do something more complicated using this technique, you might want to write something like this:


$max = 8
exec { "/bin/echo this is run #: ${::common_counter_simple}":
        logoutput => on_failure,
        onlyif => [
                "/usr/bin/test ${::common_counter_simple} -lt ${max}",
                "/usr/bin/test ${::common_counter_simple} -gt 0",
        ],
    #notify => ...,    # do some action
}

Side effects:

Make sure not to use the counter fact in a $name or somewhere that would cause frequent unwanted catalog changes, as it could cause a lot of changes each run.

Module pairings:

You might want to pair this module with my “Finite State Machine” concept, or my Exec[‘again’] module. If you come up with some cool use cases, please let me know!

Future work:

If you’d like to extend this work, two features come to mind:

  1. Individual named counters. If for some reason you want more than one counter, named counters could be built.
  2. Reset functionality. If you’d like to reset a counter to zero (or to some other value) then there should be a special type you could create which causes this to happen.

If you’d like to work on either of these features, please let me know, or send me a patch!

Happy hacking!

James

The switch as an ordinary GNU/Linux server

The fact that we manage the switches in our data centres differently than any other server is patently absurd, but we do so because we want to harness the power of a tiny bit of silicon which happens to be able to dramatically speed up the switching bandwidth.

absurd

beware of proprietary silicon, it’s absurd!

That tiny bit of silicon is known as an ASIC, or an application specific integrated circuit, and one particularly well performing ASIC (which is present in many commercially available switches) is called the Trident.

None of this should impact the end-user management experience, however, because the big switch companies and chip makers believe that there is some special differentiation in their IP, they’ve ensured that the stacks and software surrounding the hardware is highly proprietary and difficult to replace. This also lets them create and sell bundled products and features that you don’t want, but which you can’t get elsewhere.

This is still true today! System and network engineers know too well the hassles of dealing with the different proprietary switch operating systems and interfaces. Why not standardize on the well-known interface that every GNU/Linux server uses.

We’re talking about iptables of course! (Although nftables would be an acceptable standard too!) This way we could have a common interface for all the networked devices in our server room.

I’ve been able to work around this limitation in the past, by using Linux to do my routing in software, and by building the routers out of COTS 2U GNU/Linux boxes. The trouble with this approach, is that they’re bigger, louder, more expensive, consume more power, and don’t have the port density that a 48 port 1U switch does.

a 48 port, 1U switch

a 48 port, 1U switch

It turns out that there is a company which is actually trying to build this mythical box. It is not perfect, but I think they are on the right track. What follows are my opinions of what they’ve done right, what’s wrong, and what I’d like to see in the future.

Who are they?

They are Cumulus Networks, and I recently got to meet, demo and discuss with one of their very talented engineers, Leslie Carr. I recently attended a talk that she gave on this very same subject. She gave me a rocket turtle. (Yes, this now makes me biased!)

my rocket turtle, the cumulus networks mascot

my rocket turtle, the cumulus networks mascot

What are they doing?

You buy an existing switch from your favourite vendor. You then throw out (flash over) the included software, and instead, pay them a yearly licensing fee to use the “Cumulus” GNU/Linux. It comes as an OS image, based off of Debian.

How does it talk to the ASIC?

The OS image comes with a daemon called switchd that transfers the kernel iptables rules down into the ASIC. I don’t know the specifics on how this works exactly because:

  1. Switchd is proprietary. Apparently this is because of a scary NDA they had to sign, but it’s still unfortunate, and it is impeding my hacking.
  2. I’m not an expert on talking to ASIC’s. I’m guessing that unless you’ve signed the NDA’s, and you’re behind the Trident paywall, then it’s tough to become one!

Problems with packaging:

The OS is only distributed as a single image. This is an unfortunate mistake! It should be available from the upstream project with switchd (and any other add-ons) as individual .deb packages. That way, I know I’m getting a stock OS which is preferably even built and signed by the Debian team! That way I could use the same infrastructure for my servers to keep all my servers up to date.

Problems with OS security:

Unfortunately the OS doesn’t benefit from any of the standard OS security enhancements like SELinux. I’d prefer running a more advanced distro like RHEL or CentOS that have these things out of the box, but if Cumulus will continue using Debian, then they must include some more advanced security measures. I didn’t find AppArmor or grsecurity in use either. It did seem to have all the important bash security updates:

cumulus@switch1$ bash --version
GNU bash, version 4.2.37(1)-release (powerpc-unknown-linux-gnu)
Copyright (C) 2011 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later 

This is free software; you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Use of udev:

This switch does seem to support and use udev, although not being a udev expert I can’t comment on if it’s done properly or not. I’d be interested to hear from the pros. Here’s what I found:

cumulus@switch1$ cd /etc/udev/
cumulus@switch1$ tree
.
`-- rules.d
    |-- 10-cumulus.rules
    |-- 60-bridge-network-interface.rules -> /dev/null
    |-- 75-persistent-net-generator.rules -> /dev/null
    `-- 80-networking.rules -> /dev/null

1 directory, 4 files
cumulus@switch1$ cat rules.d/10-cumulus.rules | tail -n 7
# udev rules specific to Cumulus Linux

# Rule called when the linux-user-bde driver is loaded
ACTION=="add" SUBSYSTEM=="module" DEVPATH=="/module/linux_user_bde" ENV{DEVICE_NAME}="linux-user-bde" ENV{DEVICE_TYPE}="c" ENV{DEVICE_MINOR}="0" RUN="/usr/lib/cumulus/udev-module"

# Quanta LY8 uses RTC1
KERNEL=="rtc1", PROGRAM="/usr/bin/platform-detect", RESULT=="quanta,ly8_rangeley", SYMLINK+="rtc"

Other things:

There seems to be a number of extra things running on the switch. Here’s what I mean:

cumulus@switch1$ ps auxwww 
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0   2516   860 ?        Ss   Nov05   0:02 init [3]  
root         2  0.0  0.0      0     0 ?        S    Nov05   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Nov05   0:05 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S    Nov05   0:00 [kworker/u:0]
root         6  0.0  0.0      0     0 ?        S    Nov05   0:00 [migration/0]
root         7  0.0  0.0      0     0 ?        S<   Nov05   0:00 [cpuset]
root         8  0.0  0.0      0     0 ?        S<   Nov05   0:00 [khelper]
root         9  0.0  0.0      0     0 ?        S<   Nov05   0:00 [netns]
root        10  0.0  0.0      0     0 ?        S    Nov05   0:02 [sync_supers]
root        11  0.0  0.0      0     0 ?        S    Nov05   0:00 [bdi-default]
root        12  0.0  0.0      0     0 ?        S<   Nov05   0:00 [kblockd]
root        13  0.0  0.0      0     0 ?        S<   Nov05   0:00 [ata_sff]
root        14  0.0  0.0      0     0 ?        S    Nov05   0:00 [khubd]
root        15  0.0  0.0      0     0 ?        S<   Nov05   0:00 [rpciod]
root        17  0.0  0.0      0     0 ?        S    Nov05   0:00 [khungtaskd]
root        18  0.0  0.0      0     0 ?        S    Nov05   0:00 [kswapd0]
root        19  0.0  0.0      0     0 ?        S    Nov05   0:00 [fsnotify_mark]
root        20  0.0  0.0      0     0 ?        S<   Nov05   0:00 [nfsiod]
root        21  0.0  0.0      0     0 ?        S<   Nov05   0:00 [crypto]
root        34  0.0  0.0      0     0 ?        S    Nov05   0:00 [scsi_eh_0]
root        36  0.0  0.0      0     0 ?        S    Nov05   0:00 [kworker/u:2]
root        41  0.0  0.0      0     0 ?        S    Nov05   0:00 [mtdblock0]
root        42  0.0  0.0      0     0 ?        S    Nov05   0:00 [mtdblock1]
root        43  0.0  0.0      0     0 ?        S    Nov05   0:00 [mtdblock2]
root        44  0.0  0.0      0     0 ?        S    Nov05   0:00 [mtdblock3]
root        49  0.0  0.0      0     0 ?        S    Nov05   0:00 [mtdblock4]
root       362  0.0  0.0      0     0 ?        S    Nov05   0:54 [hwmon0]
root       363  0.0  0.0      0     0 ?        S    Nov05   1:01 [hwmon1]
root       398  0.0  0.0      0     0 ?        S    Nov05   0:04 [flush-8:0]
root       862  0.0  0.1  28680  1768 ?        Sl   Nov05   0:03 /usr/sbin/rsyslogd -c4
root      1041  0.0  0.2   5108  2420 ?        Ss   Nov05   0:00 /sbin/dhclient -pf /run/dhclient.eth0.pid -lf /var/lib/dhcp/dhclient.eth0.leases eth0
root      1096  0.0  0.1   3344  1552 ?        S    Nov05   0:16 /bin/bash /usr/bin/arp_refresh
root      1188  0.0  1.1  15456 10676 ?        S    Nov05   1:36 /usr/bin/python /usr/sbin/ledmgrd
root      1218  0.2  1.1  15468 10708 ?        S    Nov05   5:41 /usr/bin/python /usr/sbin/pwmd
root      1248  1.4  1.1  15480 10728 ?        S    Nov05  39:45 /usr/bin/python /usr/sbin/smond
root      1289  0.0  0.0  13832   964 ?        SNov05   0:00 /sbin/auditd
root      1291  0.0  0.0  10456   852 ?        S
root     12776  0.0  0.0      0     0 ?        S    04:31   0:01 [kworker/0:0]
root     13606  0.0  0.0      0     0 ?        S    05:05   0:00 [kworker/0:2]
root     13892  0.0  0.0      0     0 ?        S    05:13   0:00 [kworker/0:1]
root     13999  0.0  0.0   2028   512 ?        S    05:16   0:00 sleep 30
cumulus  14016  0.0  0.1   3324  1128 pts/0    R+   05:17   0:00 ps auxwww
root     30713 15.6  2.4  69324 24176 ?        Ssl  Nov05 304:16 /usr/sbin/switchd -d
root     30952  0.0  0.3  11196  3500 ?        Ss   Nov05   0:00 sshd: cumulus [priv]
cumulus  30954  0.0  0.1  11196  1684 ?        S    Nov05   0:00 sshd: cumulus@pts/0
cumulus  30955  0.0  0.2   4548  2888 pts/0    Ss   Nov05   0:01 -bash

In particular, I’m referring to ledmgrd, pwmd, smond and others. I don’t doubt these things are necessary and useful, in fact, they’re written in python and should be easy to reverse if anyone is interested, but if they’re a useful part of a switch capable operating system, I hope that they grow proper upstream projects and get appropriate documentation, licensing, and packaging too!

Switch ports are network devices:

Hacking on the device couldn’t feel more native. Anyone remember how to enumerate the switch ports on IOS? … Who cares! Try the standard iproute2 tools on a Cumulus box:

cumulus@switch1$ ip a s swp42
44: swp42: <broadcast,multicast> mtu 1500 qdisc noop state DOWN qlen 500
    link/ether 08:9e:01:f8:96:74 brd ff:ff:ff:ff:ff:ff
cumulus@switch1$ ip a | grep swp | wc -l
52

What about ifup/ifdown?

This one is a bit different:

cumulus@switch1$ file /sbin/ifup
/sbin/ifup: symbolic link to `/sbin/ifupdown'
cumulus@switch1$ file /sbin/ifupdown 
/sbin/ifupdown: Python script, ASCII text executable

The Cumulus team encountered issues with the traditional ifup/ifdown tools found in a stock distro. So they replaced them, with shiny python versions:

https://github.com/CumulusNetworks/ifupdown2

I hope that this project either gets into the upstream distro, or that some upstream writes tools that address the limitations in the various messes of shell scripts. I’m optimistic about networkd being the solution here, but until that’s fully baked, the Cumulus team has built a nice workaround. Additionally, until the Debian team finalizes on the proper technical decision to use SystemD, it has a bleak future.

Kernel:

All the kernel hackers out there will want to know what’s under the hood:

cumulus@switch1$ uname -a
Linux leaf1 3.2.46-1+deb7u1+cl2.2+1 #3.2.46-1+deb7u1+cl2.2+1 SMP Thu Oct 16 14:28:31 PDT 2014 ppc powerpc GNU/Linux

Because this is an embedded chip found in a 1U box, and not an Xeon processor, it’s noticeably slower than a traditional server. This is of course (non-sarcastically) exactly what I want. For admin tasks, it has plenty of power, and this trade-off means it has lower power consumption and heat production than a stock server. While debugging some puppet code that I was running takes longer than normal on this box, I was eventually able to get the job done. This is another reason why this box needs to act like more of an upstream distro — if it did, I’d be able to have a faster machine as my dev box!

Other tools:

Other stock tools like ethtool, and brctl, work out of the box. Bonding, vlan’s and every other feature I tested seems to work the same way you expect from a GNU/Linux system.

Puppet and automation:

Readers of my blog will know that I manage my servers with Puppet. Instead of having the puppet agent connect over an API to the switch, you can directly install and run puppet on this Cumulus Linux machine! Some users might quickly jump to using the firewall module as the solution for consistent management, but a level two user will know that a higher level wrapper around shorewall is the better approach. This is all possible with this switch and seems to work great! The downside was that I had to manually add repositories to get the shorewall packages because it is not a stock distro :(

Why not SDN?

SDN or software-defined networking, is a fantastic and interesting technology. Unfortunately, it’s a parallel problem to what I’m describing in this article, and not a solution to help you work around the lack of a good GNU+Linux switch. If programming the ASIC’s wasn’t an NDA requiring activity, I bet we’d see a lot more innovative networking companies and technologies pop up.

Future?

This product isn’t quite baked yet for me to want to use it in production, but it’s so tantalizingly close that it’s strongly worth considering. I hope that they react positively to my suggestions and create an even more native, upstream environment. Once this is done, it will be my go to, for all my switching!

Thanks:

Thanks very much to the Cumulus team for showing me their software, and giving me access to demo it on some live switches. I didn’t test performance, but I have no doubt that it competes with the market average. Prove me right by trying it out yourself!

Thanks for listening, and Happy hacking!

James

PS: Special thanks to David Caplan for the great networking discussions we had!

Continuous integration for Puppet modules

I just patched puppet-gluster and puppet-ipa to bring their infrastructure up to date with the current state of affairs…

What’s new?

  • Better README’s
  • Rake syntax checking (fewer oopsies)
  • CI (testing) with travis on git push (automatic testing for everyone)
  • Use of .pmtignore to ignore files from puppet module packages (finally)
  • Pushing modules to the forge with blacksmith (sweet!)

This last point deserves another mention. Puppetlabs created the “forge” to try to provide some sort of added value to their stewardship. Personally, I like to look for code on github instead, but nevertheless, some do use the forge. The problem is that to upload new releases, you need to click your mouse like a windows user! Someone has finally solved that problem! If you use blacksmith, a new build is just a rake push away!

Have a look at this example commit if you’re interested in seeing the plumbing.

Better documentation and FAQ answering:

I’ve answered a lot of questions by email, but this only helps out individuals. From now on, I’d appreciate if you asked your question in the form of a patch to my FAQ. (puppet-gluster, puppet-ipa)

I’ll review and merge your patch, including a follow-up patch with the answer! This way you’ll get more familiar with git and sending small patches, everyone will benefit from the response, and I’ll be able to point you to the docs (and even a specific commit) to avoid responding to already answered questions. You’ll also have the commit information of something else who already had this problem. Cool, right?

Happy hacking,

James

Introducing: Oh My Vagrant!

If you’re a reader of my code or of this blog, it’s no secret that I hack on a lot of puppet and vagrant. Recently I’ve fooled around with a bit of docker, too. I realized that the vagrant, environments I built for puppet-gluster and puppet-ipa needed to be generalized, and they needed new features too. Therefore…

Introducing: Oh My Vagrant!

Oh My Vagrant is an attempt to provide an easy to use development environment so that you can be up and hacking quickly, and focusing on the real devops problems. The README explains my choice of project name.

Prerequisites:

I use a Fedora 20 laptop with vagrant-libvirt. Efforts are underway to create an RPM of vagrant-libvirt, but in the meantime you’ll have to read: Vagrant on Fedora with libvirt (reprise). This should work with other distributions too, but I don’t test them very often. Please step up and help test :)

The bits:

First clone the oh-my-vagrant repository and look inside:

git clone --recursive https://github.com/purpleidea/oh-my-vagrant
cd oh-my-vagrant/vagrant/

The included Vagrantfile is the current heart of this project. You’re welcome to use it as a template and edit it directly, or you can use the facilities it provides. I’d recommend starting with the latter, which I’ll walk you through now.

Getting started:

Start by running vagrant status (vs) and taking a look at the vagrant.yaml file that appears.

james@computer:/oh-my-vagrant/vagrant$ ls
Dockerfile  puppet/  Vagrantfile
james@computer:/oh-my-vagrant/vagrant$ vs
Current machine states:

template1                 not created (libvirt)

The Libvirt domain is not created. Run `vagrant up` to create it.
james@computer:/oh-my-vagrant/vagrant$ cat vagrant.yaml 
---
:domain: example.com
:network: 192.168.123.0/24
:image: centos-7.0
:sync: rsync
:puppet: false
:docker: false
:cachier: false
:vms: []
:namespace: template
:count: 1
:username: ''
:password: ''
:poolid: []
:repos: []
james@computer:/oh-my-vagrant/vagrant$

Here you’ll see the list of resultant machines that vagrant thinks is defined (currently just template1), and a bunch of different settings in YAML format. The values of these settings help define the vagrant environment that you’ll be hacking in.

Changing settings:

The settings exist so that your vagrant environment is dynamic and can be changed quickly. You can change the settings by editing the vagrant.yaml file. They will be used by vagrant when it runs. You can also change them at runtime with --vagrant-foo flags. Running a vagrant status will show you how vagrant currently sees the environment. Let’s change the number of machines that are defined. Note the location of the --vagrant-count flag and how it doesn’t work when positioned incorrectly.

james@computer:/oh-my-vagrant/vagrant$ vagrant status --vagrant-count=4
An invalid option was specified. The help for this command
is available below.

Usage: vagrant status [name]
    -h, --help                       Print this help
james@computer:/oh-my-vagrant/vagrant$ vagrant --vagrant-count=4 status
Current machine states:

template1                 not created (libvirt)
template2                 not created (libvirt)
template3                 not created (libvirt)
template4                 not created (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
james@computer:/oh-my-vagrant/vagrant$ cat vagrant.yaml 
---
:domain: example.com
:network: 192.168.123.0/24
:image: centos-7.0
:sync: rsync
:puppet: false
:docker: false
:cachier: false
:vms: []
:namespace: template
:count: 4
:username: ''
:password: ''
:poolid: []
:repos: []
james@computer:/oh-my-vagrant/vagrant$

As you can see in the above example, changing the count variable to 4, causes vagrant to see a possible four machines in the vagrant environment. You can change as many of these parameters at a time by using the --vagrant- flags, or you can edit the vagrant.yaml file. The latter is much easier and more expressive, in particular for expressing complex data types. The former is much more powerful when building one-liners, such as:

vagrant --vagrant-count=8 --vagrant-namespace=gluster up gluster{1..8}

which should bring up eight hosts in parallel, named gluster1 to gluster8.

Other VM’s:

Since one often wants to be more expressive in machine naming and heterogeneity of machine type, you can specify a list of machines to define in the vagrant.yaml file vms array. If you’d rather define these machines in the Vagrantfile itself, you can also set them up in the vms array defined there. It is empty by default, but it is easy to uncomment out one of the many examples. These will be used as the defaults if nothing else overrides the selection in the vagrant.yaml file. I’ve uncommented a few to show you this functionality:

james@computer:/oh-my-vagrant/vagrant$ grep example[124] Vagrantfile 
    {:name => 'example1', :docker => true, :puppet => true, },    # example1
    {:name => 'example2', :docker => ['centos', 'fedora'], },    # example2
    {:name => 'example4', :image => 'centos-6', :puppet => true, },    # example4
james@computer:/oh-my-vagrant/vagrant$ rm vagrant.yaml # note that I remove the old settings
james@computer:/oh-my-vagrant/vagrant$ vs
Current machine states:

template1                 not created (libvirt)
example1                  not created (libvirt)
example2                  not created (libvirt)
example4                  not created (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
james@computer:/oh-my-vagrant/vagrant$ cat vagrant.yaml 
---
:domain: example.com
:network: 192.168.123.0/24
:image: centos-7.0
:sync: rsync
:puppet: false
:docker: false
:cachier: false
:vms:
- :name: example1
  :docker: true
  :puppet: true
- :name: example2
  :docker:
  - centos
  - fedora
- :name: example4
  :image: centos-6
  :puppet: true
:namespace: template
:count: 1
:username: ''
:password: ''
:poolid: []
:repos: []
james@computer:/oh-my-vagrant/vagrant$ vim vagrant.yaml # edit vagrant.yaml file...
james@computer:/oh-my-vagrant/vagrant$ cat vagrant.yaml 
---
:domain: example.com
:network: 192.168.123.0/24
:image: centos-7.0
:sync: rsync
:puppet: false
:docker: false
:cachier: false
:vms:
- :name: example1
  :docker: true
  :puppet: true
- :name: example4
  :image: centos-7.0
  :puppet: true
:namespace: template
:count: 1
:username: ''
:password: ''
:poolid: []
:repos: []
james@computer:/oh-my-vagrant/vagrant$ vs
Current machine states:

template1                 not created (libvirt)
example1                  not created (libvirt)
example4                  not created (libvirt)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.
james@computer:/oh-my-vagrant/vagrant$

The above output might seem a little long, but if you try these steps out in your terminal, you should get a hang of it fairly quickly. If you poke around in the Vagrantfile, you should see the format of the vms array. Each element in the array should be a dictionary, where the keys correspond to the flags you wish to set. Look at the examples if you need help with the formatting.

Other settings:

As you saw, other settings are available. There are a few notable ones that are worth mentioning. This will also help explain some of the other features that this Vagrantfile provides.

  • domain: This sets the domain part of each vm’s FQDN. The default is example.com, which should work for most environments, but you’re welcome to change this as you see fit.
  • network: This sets the network that is used for the vm’s. You should pick a network/cidr that doesn’t conflict with any other networks on your machine. This is particularly useful when you have multiple vagrant environments hosted off of the same laptop.
  • image: This is the default base image to use for each machine. It can be overridden per-machine in the vm’s list of dictionaries.
  • sync: This is the sync type used for vagrant. rsync is the default and works in all environments. If you’d prefer to fight with the nfs mounts, or try out 9p, both those options are available too.
  • puppet: This option enables or disables integration with puppet. It is possible to override this per machine. This functionality will be expanded in a future version of Oh My Vagrant.
  • docker: This option enables and lists the docker images to set up per vm. It is possible to override this per machine. This functionality will be expanded in a future version of Oh My Vagrant.
  • namespace: This sets the namespace that your Vagrantfile operates in. This value is used as a prefix for the numbered vm’s, as the libvirt network name, and as the primary puppet module to execute.

More on the docker option:

For now, if you specify a list of docker images, they will be automatically pulled into your vm environment. It is recommended that you pre-cache them in an existing base image to save bandwidth. Custom base vagrant images can be easily be built with vagrant-builder, but this process is currently undocumented.

I’ll try to write-up a post on this process if there are enough requests. To keep you busy in the meantime, I’ve published a CentOS 7 vagrant base image that includes docker images for CentOS and Fedora. It is being graciously hosted by the GlusterFS community.

What other magic does this all do?

There is a certain amount of magic glue that happens behind the scenes. Here’s a list of some of it:

  • Idempotent /etc/hosts based DNS
  • Easy docker base image installation
  • IP address calculations and assignment with ipaddr
  • Clever cleanup on ‘vagrant destroy
  • Vagrant docker base image detection
  • Integration with Puppet

If you don’t understand what all of those mean, and you don’t want to go source diving, don’t worry about it! I will explain them in greater detail when it’s important, and hopefully for now everything “just works” and stays out of your way.

Future work:

There’s still a lot more that I have planned, and some parts of the Vagrantfile need clean up, but I figured I’d try and release this early so that you can get hacking right away. If it’s useful to you, please leave a comment and let me know.

Happy hacking,

James

 

Rough data density calculations

Seagate has just publicly announced 8TB HDD’s in a 3.5″ form factor. I decided to do some rough calculations to understand the density a bit better…

Note: I have decided to ignore the distinction between Terabytes (TB) and Tebibytes (TiB), since I always work in base 2, but I hate the -bi naming conventions. Seagate is most likely announcing an 8TB HDD, which is actually smaller than a true 8TiB drive. If you don’t know the difference it’s worth learning.

Rack Unit Density:

Supermicro sells a high density, double-sided 4U server, which can hold 90 x 3.5″ drives. This means you can easily store:

90 * 8TB = 720TB in 4U,

or:

720TB/4U = 180TB per U.

To store a petabyte of data, since:

1PB = 1024TB,

we need:

1024TB/180TB/U = 5.68 U.

Rounding up we realize that we can easily store one petabyte of raw data in 6U.

Since an average rack is usually 42U (tall racks can be 48U) that means we can store between seven and eight PB per rack:

42U/rack / 6U/PB = 7PB/rack

48U/rack / 6U/PB = 8PB/rack

If you can provide the power and cooling, you can quickly see that small data centers can easily get into exabyte scale if needed. One raw exabyte would only require:

1EB = 1024PB

1024PB/7PB/rack = 146 racks =~ 150 racks.

Raid and Redundancy:

Since you’ll most likely have lots of failures, I would recommend having some number of RAID sets per server, and perhaps a distributed file system like GlusterFS to replicate the data across different servers. Suppose you broke each 90 drive server into five separate RAID 6 bricks for GlusterFS:

90/5 = 18 drives per brick.

In RAID 6, you loose two drives to parity, so that means:

18 drives – 2 drives = 16 drives per brick of usable storage.

16 drives * 5 bricks * 8 TB = 640 TB after RAID 6 in 4U.

640TB/4U = 160TB/U

1024TB/160TB/U = 6.4TB/U =~ 7PB/rack.

Since I rounded a lot, the result is similar. With a replica count of 2 in a standard GlusterFS configuration, you average a total of about 3-4PB of usable storage per rack. Need a petabyte scale filesystem? One rack should do it!

Other considerations:

  • Remember that you need to take into account space for power, cooling and networking.
  • Keep in mind that SMR might be used to increase density even further (unless it’s not already being used on these drives).
  • Remember that these calculations were done to understand the order of magnitude, and not to get a precise measurement on the size of a planned cluster.
  • Petabyte scale is starting to feel small…

Conclusions:

Storage is getting very inexpensive. After the above analysis, I feel safe in concluding that:

  1. Puppet-Gluster could easily automate a petabyte scale filesystem.
  2. I have an embarrassingly small amount of personal storage.

Hope this was fun,

Happy hacking,

James

 

Disclaimer: I have not tried the 8TB Seagate HDD’s, or the Supermicro 90 x 3.5″ servers, but if you are building a petabyte scale cluster with GlusterFS/Puppet-Gluster, I’d like to hear about it!