Remote execution in mgmt

Bootstrapping a cluster from your laptop, or managing machines without needing to first setup a separate config management infrastructure are both very reasonable and fundamental asks. I was particularly inspired by Ansible‘s agent-less remote execution model, but never wanted to build a centralized orchestrator. I soon realized that I could have my ice cream and eat it too.

Prior knowledge

If you haven’t read the earlier articles about mgmt, then I recommend you start with those, and then come back here. The first and fourth are essential if you’re going to make sense of this article.

Limitations of existing orchestrators

Current orchestrators have a few limitations.

  1. They can be a single point of failure
  2. They can have scaling issues
  3. They can’t respond instantaneously to node state changes (they poll)
  4. They can’t usually redistribute remote node run-time data between nodes

Despite these limitations, orchestration is still very useful because of the facilities it provides. Since these facilities are essential in a next generation design, I set about integrating these features, but with a novel twist.

Implementation, Usage and Design

Mgmt is written in golang, and that decision was no accident. One benefit is that it simplifies our remote execution model.

To use this mode you run mgmt with the --remote flag. Each use of the --remote argument points to a different remote graph to execute. Eventually this will be integrated with the DSL, but this plumbing is exposed for early adopters to play around with.

Startup (part one)

Each invocation of --remote causes mgmt to remotely connect over SSH to the target hosts. This happens in parallel, and runs up to --cconns simultaneous connections.

A temporary directory is made on the remote host, and the mgmt binary and graph are copied across the wire. Since mgmt compiles down to a single statically compiled binary, it simplifies the transfer of the software. The binary is cached remotely to speed up future runs unless you pass the --no-caching option.

A TCP connection is tunnelled back over SSH to the originating hosts etcd server which is embedded and running inside of the initiating mgmt binary.

Execution (part two)

The remote mgmt binary is now run! It wires itself up through the SSH tunnel so that its internal etcd client can connect to the etcd server on the initiating host. This is particularly powerful because remote hosts can now participate in resource exchanges as if they were part of a regular etcd backed mgmt cluster! They don’t connect directly to each other, but they can share runtime data, and only need an incoming SSH port open!

Closure (part three)

At this point mgmt can either keep running continuously or it can close the connections and shutdown.

In the former case, you can either remain attached over SSH, or you can disconnect from the child hosts and let this new cluster take on a new life and operate independently of the initiator.

In the latter case you can either shutdown at the operators request (via a ^C on the initiator) or when the cluster has simultaneously converged for a number of seconds.

This second possibility occurs when you run mgmt with the familiar --converged-timeout parameter. It is indeed clever enough to also work in this distributed fashion.

Diagram

I’ve used by poor libreoffice draw skills to make a diagram. Hopefully this helps out my visual readers.

remote-execution

If you can improve this diagram, please let me know!

Example

I find that using one or more vagrant virtual machines for the remote endpoints is the best way to test this out. In my case I use Oh-My-Vagrant to set up these machines, but the method you use is entirely up to you! Here’s a sample remote execution. Please note that I have omitted a number of lines for brevity, and added emphasis to the more interesting ones.

james@hostname:~/code/mgmt$ ./mgmt run --remote examples/remote2a.yaml --remote examples/remote2b.yaml --tmp-prefix 
17:58:22 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
17:58:23 remote.go:596: Remote: Connect...
17:58:23 remote.go:607: Remote: Sftp...
17:58:23 remote.go:164: Remote: Self executable is: /home/james/code/gopath/src/github.com/purpleidea/mgmt/mgmt
17:58:23 remote.go:221: Remote: Remotely created: /tmp/mgmt-412078160/remote
17:58:23 remote.go:226: Remote: Remote path is: /tmp/mgmt-412078160/remote/mgmt
17:58:23 remote.go:221: Remote: Remotely created: /tmp/mgmt-412078160/remote
17:58:23 remote.go:226: Remote: Remote path is: /tmp/mgmt-412078160/remote/mgmt
17:58:23 remote.go:235: Remote: Copying binary, please be patient...
17:58:23 remote.go:235: Remote: Copying binary, please be patient...
17:58:24 remote.go:256: Remote: Copying graph definition...
17:58:24 remote.go:618: Remote: Tunnelling...
17:58:24 remote.go:630: Remote: Exec...
17:58:24 remote.go:510: Remote: Running: /tmp/mgmt-412078160/remote/mgmt run --hostname '192.168.121.201' --no-server --seeds 'http://127.0.0.1:2379' --file '/tmp/mgmt-412078160/remote/remote2a.yaml' --depth 1
17:58:24 etcd.go:2088: Etcd: Watch: Path: /_mgmt/exported/
17:58:24 main.go:255: Main: Waiting...
17:58:24 remote.go:256: Remote: Copying graph definition...
17:58:24 remote.go:618: Remote: Tunnelling...
17:58:24 remote.go:630: Remote: Exec...
17:58:24 remote.go:510: Remote: Running: /tmp/mgmt-412078160/remote/mgmt run --hostname '192.168.121.202' --no-server --seeds 'http://127.0.0.1:2379' --file '/tmp/mgmt-412078160/remote/remote2b.yaml' --depth 1
17:58:24 etcd.go:2088: Etcd: Watch: Path: /_mgmt/exported/
17:58:24 main.go:291: Config: Parse failure
17:58:24 main.go:255: Main: Waiting...
^C17:58:48 main.go:62: Interrupted by ^C
17:58:48 main.go:397: Destroy...
17:58:48 remote.go:532: Remote: Output...
|    17:58:23 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
|    17:58:47 main.go:419: Goodbye!
17:58:48 remote.go:636: Remote: Done!
17:58:48 remote.go:532: Remote: Output...
|    17:58:24 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
|    17:58:48 main.go:419: Goodbye!
17:58:48 remote.go:636: Remote: Done!
17:58:48 main.go:419: Goodbye!

You should see that we kick off the remote executions, and how they are wired back through the tunnel. In this particular case we terminated the runs with a ^C.

The example configurations I used are available here and here. If you had a terminal open on the first remote machine, after about a second you would have seen:

[root@omv1 ~]# ls -d /tmp/file*  /tmp/mgmt*
/tmp/file1a  /tmp/file2a  /tmp/file2b  /tmp/mgmt-412078160
[root@omv1 ~]# cat /tmp/file*
i am file1a
i am file2a, exported from host a
i am file2b, exported from host b

You can see the remote execution artifacts, and that there was clearly data exchange. You can repeat this example with --converged-timeout=5 to automatically terminate after five seconds of cluster wide inactivity.

Live remote hacking

Since mgmt is event based, and graph structure configurations manifest themselves as event streams, you can actually edit the input configuration on the initiating machine, and as soon as the file is saved, it will instantly remotely propagate and apply the graph differential.

For this particular example, since we export and collect resources through the tunnelled SSH connections, it means editing the exported file, will also cause both hosts to update that file on disk!

You’ll see this occurring with this message in the logs:

18:00:44 remote.go:973: Remote: Copied over new graph definition: examples/remote2b.yaml

While you might not necessarily want to use this functionality on a production machine, it will definitely make your interactive hacking sessions more useful, in particular because you never need to re-run parts of the graph which have already converged!

Auth

In case you’re wondering, mgmt can look in your ~/.ssh/ for keys to use for the auth, or it can prompt you interactively. It can also read a plain text password from the connection string, but this isn’t a recommended security practice.

Hierarchial remote execution

Even though we recommend running mgmt in a normal clustered mode instead of over SSH, we didn’t want to limit the number of hosts that can be configured using remote execution. For this reason it would be architecturally simple to add support for what we’ve decided to call “hierarchial remote execution”.

In this mode, the primary initiator would first connect to one or more secondary nodes, which would then stage a second series of remote execution runs resulting in an order of depth equal to two or more. This fan out approach can be used to distribute the number of outgoing connections across more intermediate machines, or as a method to conserve remote execution bandwidth on the primary link into your datacenter, by having the secondary machine run most of the remote execution runs.

remote-execution2

This particular extension hasn’t been built, although some of the plumbing has been laid. If you’d like to contribute this feature to the upstream project, please join us in #mgmtconfig on Freenode and let us (I’m @purpleidea) know!

Docs

There is some generated documentation for the mgmt remote package available. There is also the beginning of some additional documentation in the markdown docs. You can help contribute to either of these by sending us a patch!

Novel resources

Our event based architecture can enable some previously improbable kinds of resources. In particular, I think it would be quite beautiful if someone built a provisioning resource. The Watch method of the resource API normally serves to notify us of events, but since it is a main loop that blocks in a select call, it could also be used to run a small server that hosts a kickstart file and associated TFTP images. If you like this idea, please help us build it!

Conclusion

I hope you enjoyed this article and found this remote execution methodology as novel as we do. In particular I hope that I’ve demonstrated that configuration software doesn’t have to be constrained behind a static orchestration topology.

Happy Hacking,

James

Advertisements

mgmt has a logo

The mgmt config project got a logo! The full commit is here. Thanks to Sarah Jane Cox for creating it.

mgmt

Happy Hacking,

James

PS: I might have a few stickers to give out too! Ask me next time you see me if you’d like one! Alternatively, use the artwork to make your own and share with your friends!

Live dmesg following

All good sysadmins eventually learn about using tail -F to tail files. Yes upper-case F is superior.

Around the time I wrote that article, I remember wanting to stream dmesg output too! The functionality wasn’t available without some sort of polling hack, but it turns out that kernel support for this actually landed around the same time in version 3.5.0!

Most GNU/Linux distros are probably running a new enough version by now, and you can now dmesg --follow (or dmesg -w):

$ dmesg -w
[1042958.877980] restoring control 00000000-0000-0000-0000-000000000101/10/5
[1042959.254826] usb 1-1.2: reset low-speed USB device number 3 using ehci-pci
[1042959.356847] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
[1042959.530884] PM: resume of devices complete after 976.885 msecs
[1042959.531457] PM: Finishing wakeup.
[1042959.531460] Restarting tasks ... done.
[1042959.622234] video LNXVIDEO:00: Restoring backlight state
[1042959.767952] e1000e: enp0s25 NIC Link is Down
[1042959.771333] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready
[1048528.391506] All your base are belong to us.

As an added bonus, you can access this via journalctl --dmesg --follow too:

$ journalctl -kf
[snip]
Aug 28 19:58:13 hostname unknown: All your base are belong to us.
Now we have a dmesg version too! https://www.youtube.com/watch?v=8fvTxv46ano&html5=1

Now we have a dmesg version too!

Since my dmesg output wasn’t very noisy when writing this article, and since I didn’t write an “all your base” kernel module, you can actually test this functionality by writing to the kernel ring buffer:

$ sudo bash -c 'echo The Technical Blog of James is awesome! > /dev/kmsg'

Happy hacking!

James

PS: Since this is a facility that provides events, we could eventually write an mgmt config “fact” or resource around it!

Seen in downtown Montreal…

The Technical Blog of James was seen on an outdoor electronic display in downtown Montreal! Thanks to one of my readers for sending this in.

I guess the smart phone revolution is over, and people are taking to reading my articles on bigger screens!

I guess the smart phone revolution is over, and people are taking to reading my articles on bigger screens! The “poutine” is decent proof that this is probably Montreal.

If you’ve got access to a large electronic display, put up the blog, snap a photo, and send it my way! I’ll post it here and send you some random stickers!

Happy Hacking,

James

PS: If you have some comments about this blog, please don’t be shy, send them my way.

Ten minute hacks: Hacking airplane headphones

I was stuck on a 14 hour flight last week, and to my disappointment, only one of the two headphone speakers were working. The plane’s media centre has an audio connector that looks like this:

airplane-headphones-jack

Someone should consider probing this USB port.

The hole to the left is smaller than a 3.5mm headphone jack, and designed for a proprietary headphone connector that I didn’t have, and the two holes to the right are part of a different proprietary connector which match with the cheap airline headphones to provide the left and right audio channels.

airplane-headphones-connected

Completely reversible, and therefore completely ambiguous. Stereo is so 1880’s anyways.

By reversing the connector, I was quickly able to determine that the headphones were not faulty, because this swapped the missing audio channel to the other ear. It’s also immediately obvious that since there are no left vs. right polarity markings on either the receptacle or the headphones, there’s a 50% chance that you’ll get reverse stereo.

With the fault identified, and lots of time to kill, I decided to try to hack a workaround. I borrowed some tweezers from a nearby passenger, and slowly ripped off some of the exterior plastic to expose the signal wires. To my surprise there were actually four wires, instead of three using a shared ground.

airplane-headphones-separated

Headphone wires stripped, exposed and ready for splicing.

With a bit of care this only took about five minutes. The next step was to “patch” the working positive and ground wires from the working channel, into the speaker from the broken channel. I did this by trial and error using a bit of intuition to try to keep both speakers in phase.

airplane-headphones-spliced

After a twist splice and using paper as an insulator.

A small scrap of paper acted as an insulator to prevent short circuits between the positive and negative wires. Lastly, a figure eight on a bight was tied to isolate the weak splice from any tension, thus preventing damage and disconnects.

airplane-headphones-knotted

All wrapped up neatly and tied with a knot.

The finished product worked beautifully, despite now only providing monaural audio and is about five centimetres shorter, which is still perfectly usable since the seats hardly recline. The flight staff weren’t angry that I had cannibalized their headphones, but also didn’t understand how my contraption was able to solve the problem.

This fun little ten minute hack helped provide some distraction in economy class, and maybe it will be useful to you since I doubt they’ve repaired the media system in the seat! If you work for Emirates, let me know and I’ll give you the seat and flight number.

Happy hacking!

James

Automatic grouping in mgmt

In this post, I’ll tell you about the recently released “automatic grouping” or “AutoGroup” feature in mgmt, a next generation configuration management prototype. If you aren’t already familiar with mgmt, I’d recommend you start by reading the introductory post, and the second post. There’s also an introductory video.

Resources in a graph

Most configuration management systems use something called a directed acyclic graph, or DAG. This is a fancy way of saying that it is a bunch of circles (vertices) which are connected with arrows (edges). The arrows must be connected to exactly two vertices, and you’re only allowed to move along each arrow in one direction (directed). Lastly, if you start at any vertex in the graph, you must never be able to return to where you started by following the arrows (acyclic). If you can, the graph is not fit for our purpose.

A DAG from Wikipedia

An example DAG from Wikipedia

The graphs in configuration management systems usually represent the dependency relationships (edges) between the resources (vertices) which is important because you might want to declare that you want a certain package installed before you start a service. To represent the kind of work that you want to do, different kinds of resources exist which you can use to specify that work.

Each of the vertices in a graph represents a unique resource, and each is backed by an individual software routine or “program” which can check the state of the resource, and apply the correct state if needed. This makes each resource idempotent. If we have many individual programs, this might turn out to be a lot of work to do to get our graph into the desired state!

Resource grouping

It turns out that some resources have a fixed overhead to starting up and running. If we can group resources together so that they share this fixed overhead, then our graph might converge faster. This is exactly what we do in mgmt!

Take for example, a simple graph such as the following:

Simple DAG showing three pkg, two file, and one svc resource

Simple DAG showing one svc, two file, and three pkg resources…

We can logically group the three pkg resources together and redraw the graph so that it now looks like this:

DAG with the three pkg resources now grouped into one.

DAG with the three pkg resources now grouped into one! Overlapping vertices mean that they act as if they’re one vertex instead of three!

This all happens automatically of course! It is very important that the new graph is a faithful, logical representation of the original graph, so that the specified dependency relationships are preserved. What this represents, is that when multiple resources are grouped (shown by overlapping vertices in the graph) they run together as a single unit. This is the practical difference between running:

$ dnf install -y powertop
$ dnf install -y sl
$ dnf install -y cowsay

if not grouped, and:

$ dnf install -y powertop sl cowsay

when grouped. If you try this out you’ll see that the second scenario is much faster, and on my laptop about three times faster! This is because of fixed overhead such as cache updates, and the dnf dep solver that each process runs.

This grouping means mgmt uses this faster second scenario instead of the slower first scenario that all the current generation tools do. It’s also important to note that different resources can implement the grouping feature to optimize for different things besides performance. More on that later…

The algorithm

I’m not an algorithmist by training, so it took me some fiddling to come up with an appropriate solution. I’ve implemented it along with an extensive testing framework and a series of test cases, which it passes of course! If we ever find a graph that does not get grouped correctly, then we can iterate on the algorithm and add it as a new test case.

The algorithm turns out to be relatively simple. I first noticed that vertices which had a relationship between them must not get grouped, because that would undermine the precedence ordering of the vertices! This property is called reachability. I then attempt to group every vertex to every other vertex that has no reachability or reverse reachability to it!

The hard part turned out to be getting all the plumbing surrounding the algorithm correct, and in particular the actual vertex merging algorithm, so that “discarded edges” are reattached in the correct places. I also took a bit of extra time to implement the algorithm as a struct which satisfies an “AutoGrouper” interface. This way, if you’d like to implement a different algorithm, it’s easy to drop in your replacement. I’m fairly certain that a more optimal version of my algorithm is possible for anyone wishing to do the analysis.

A quick note on nomenclature: I’ve actually decided to call this grouping and not merging, because we actually preserve the unique data of each resource so that they can be taken apart and mixed differently when (and if) there is a change in the compiled graph. This makes graph changeovers very cheap in mgmt, because we don’t have to re-evaluate anything which remains constant between graphs. Merging would imply a permanent reduction and loss of unique identity.

Parallelism and user choice

It’s worth noting two important points:

  1. Auto grouping of resources usually decreases the parallelism of a graph.
  2. The user might not want a particular resource to get grouped!

You might remember that one of the novel properties of mgmt, is that it executes the graph in parallel whenever possible. Although the grouping of resources actually removes some of this parallelism, certain resources such as the pkg resource already have an innate constraint on sequential behaviour, namely: the package manager lock. Since these tools can’t operate in parallel, and since each execution has a fixed overhead, it’s almost always beneficial to group pkg resources together.

Grouping is also not mandatory, so while it is a sensible default, you can disable grouping per resource with a simple meta parameter.

Lastly, it’s also worth mentioning that grouping doesn’t “magically” happen without some effort. The underlying resource needs to know how to optimize, watch, check and apply multiple resources simultaneously for it to support the feature. At the moment, only the pkg resource can do any grouping, and even then, there could always be some room for improvement. It’s also not optimal (or even logical) to group certain types of resources, so those will never be able to do any grouping. We also don’t group together resources of different kinds, although mgmt could support this if a valid use case is ever found.

File grouping

As I mentioned, only the pkg resource supports grouping at this time. The file resource demonstrates a different use case for resource grouping. Suppose you want to monitor 10000 files in a particular directory, but they are specified individually. This would require far too many inotify watches than a normal system usually has, so the grouping algorithm could group them into a single resource, which then uses a recursive watcher such as fanotify to reduce the watcher count by a factor of 10000. Unfortunately neither the file resource grouping, nor the fanotify support for this exist at the moment. If you’d like to implement either of these, please let me know!

If you can think of another resource kind that you’d like to write, or in particular, if you know of one which would work well with resource grouping, please contact me!

Science!

I wouldn’t be a very good scientist (I’m actually a Physiologist by training) if I didn’t include some data and a demonstration that this all actually works, and improves performance! What follows will be a good deal of information, so skim through the parts you don’t care about.

Science <3 data

Science <3 data

I decided to test the following four scenarios:

  1. single package, package check, package already installed
  2. single package, package install, package not installed
  3. three packages, package check, packages already installed
  4. three packages, package install, packages not installed

These are the situations you’d encounter when running your tool of choice to install one or more packages, and finding them either already present, or in need of installation. I timed each test, which ends when the tool tells us that our system has converged.

Each test is performed multiple times, and the average is taken, but only after we’ve run the tool at least twice so that the caches are warm.

We chose small packages so that the fixed overhead delays due to bandwidth and latencies are minimal, and so that our data is more representative of the underlying tool.

The single package tests use the powertop package, and the three package tests use powertop, sl, and cowsay. All tests were performed on an up-to-date Fedora 23 laptop, with an SSD. If you haven’t tried sl and cowsay, do give them a go!

The four tools tested were:

  1. puppet
  2. mgmt
  3. pkcon
  4. dnf

The last two are package manager front ends so that it’s more obvious how expensive something is expected to cost, and so that you can discern what amount of overhead is expected, and what puppet or mgmt is causing you. Here are a few representative runs:

mgmt installation of powertop:

$ time sudo ./mgmt run --file examples/pkg1.yaml --converged-timeout=0
21:04:18 main.go:63: This is: mgmt, version: 0.0.3-1-g6f3ac4b
21:04:18 main.go:64: Main: Start: 1459299858287120473
21:04:18 main.go:190: Main: Running...
21:04:18 main.go:113: Etcd: Starting...
21:04:18 main.go:117: Main: Waiting...
21:04:18 etcd.go:113: Etcd: Watching...
21:04:18 etcd.go:113: Etcd: Watching...
21:04:18 configwatch.go:54: Watching: examples/pkg1.yaml
21:04:20 config.go:272: Compile: Adding AutoEdges...
21:04:20 config.go:533: Compile: Grouping: Algorithm: nonReachabilityGrouper...
21:04:20 main.go:171: Graph: Vertices(1), Edges(0)
21:04:20 main.go:174: Graphviz: No filename given!
21:04:20 pgraph.go:764: State: graphStateNil -> graphStateStarting
21:04:20 pgraph.go:825: State: graphStateStarting -> graphStateStarted
21:04:20 main.go:117: Main: Waiting...
21:04:20 pkg.go:245: Pkg[powertop]: CheckApply(true)
21:04:20 pkg.go:303: Pkg[powertop]: Apply
21:04:20 pkg.go:317: Pkg[powertop]: Set: installed...
21:04:25 packagekit.go:399: PackageKit: Woops: Signal.Path: /8442_beabdaea
21:04:25 packagekit.go:399: PackageKit: Woops: Signal.Path: /8443_acbadcbd
21:04:31 pkg.go:335: Pkg[powertop]: Set: installed success!
21:04:31 main.go:79: Converged for 0 seconds, exiting!
21:04:31 main.go:55: Interrupted by exit signal
21:04:31 pgraph.go:796: Pkg[powertop]: Exited
21:04:31 main.go:203: Goodbye!

real    0m13.320s
user    0m0.023s
sys    0m0.021s

puppet installation of powertop:

$ time sudo puppet apply pkg.pp 
Notice: Compiled catalog for computer.example.com in environment production in 0.69 seconds
Notice: /Stage[main]/Main/Package[powertop]/ensure: created
Notice: Applied catalog in 10.13 seconds

real    0m18.254s
user    0m9.211s
sys    0m2.074s

dnf installation of powertop:

$ time sudo dnf install -y powertop
Last metadata expiration check: 1:22:03 ago on Tue Mar 29 20:04:29 2016.
Dependencies resolved.
==========================================================================
 Package          Arch           Version            Repository       Size
==========================================================================
Installing:
 powertop         x86_64         2.8-1.fc23         updates         228 k

Transaction Summary
==========================================================================
Install  1 Package

Total download size: 228 k
Installed size: 576 k
Downloading Packages:
powertop-2.8-1.fc23.x86_64.rpm            212 kB/s | 228 kB     00:01    
--------------------------------------------------------------------------
Total                                     125 kB/s | 228 kB     00:01     
Running transaction check
Transaction check succeeded.
Running transaction test
Transaction test succeeded.
Running transaction
  Installing  : powertop-2.8-1.fc23.x86_64                            1/1 
  Verifying   : powertop-2.8-1.fc23.x86_64                            1/1 

Installed:
  powertop.x86_64 2.8-1.fc23                                              

Complete!

real    0m10.406s
user    0m4.954s
sys    0m0.836s

puppet installation of powertop, sl and cowsay:

$ time sudo puppet apply pkg3.pp 
Notice: Compiled catalog for computer.example.com in environment production in 0.68 seconds
Notice: /Stage[main]/Main/Package[powertop]/ensure: created
Notice: /Stage[main]/Main/Package[sl]/ensure: created
Notice: /Stage[main]/Main/Package[cowsay]/ensure: created
Notice: Applied catalog in 33.02 seconds

real    0m41.229s
user    0m19.085s
sys    0m4.046s

pkcon installation of powertop, sl and cowsay:

$ time sudo pkcon install powertop sl cowsay
Resolving                     [=========================]         
Starting                      [=========================]         
Testing changes               [=========================]         
Finished                      [=========================]         
Installing                    [=========================]         
Querying                      [=========================]         
Downloading packages          [=========================]         
Testing changes               [=========================]         
Installing packages           [=========================]         
Finished                      [=========================]         

real    0m14.755s
user    0m0.060s
sys    0m0.025s

and finally, mgmt installation of powertop, sl and cowsay with autogrouping:

$ time sudo ./mgmt run --file examples/autogroup2.yaml --converged-timeout=0
21:16:00 main.go:63: This is: mgmt, version: 0.0.3-1-g6f3ac4b
21:16:00 main.go:64: Main: Start: 1459300560994114252
21:16:00 main.go:190: Main: Running...
21:16:00 main.go:113: Etcd: Starting...
21:16:00 main.go:117: Main: Waiting...
21:16:00 etcd.go:113: Etcd: Watching...
21:16:00 etcd.go:113: Etcd: Watching...
21:16:00 configwatch.go:54: Watching: examples/autogroup2.yaml
21:16:03 config.go:272: Compile: Adding AutoEdges...
21:16:03 config.go:533: Compile: Grouping: Algorithm: nonReachabilityGrouper...
21:16:03 config.go:533: Compile: Grouping: Success for: Pkg[powertop] into Pkg[cowsay]
21:16:03 config.go:533: Compile: Grouping: Success for: Pkg[sl] into Pkg[cowsay]
21:16:03 main.go:171: Graph: Vertices(1), Edges(0)
21:16:03 main.go:174: Graphviz: No filename given!
21:16:03 pgraph.go:764: State: graphStateNil -> graphStateStarting
21:16:03 pgraph.go:825: State: graphStateStarting -> graphStateStarted
21:16:03 main.go:117: Main: Waiting...
21:16:03 pkg.go:245: Pkg[autogroup:(cowsay,powertop,sl)]: CheckApply(true)
21:16:03 pkg.go:303: Pkg[autogroup:(cowsay,powertop,sl)]: Apply
21:16:03 pkg.go:317: Pkg[autogroup:(cowsay,powertop,sl)]: Set: installed...
21:16:08 packagekit.go:399: PackageKit: Woops: Signal.Path: /8547_cbeaddda
21:16:08 packagekit.go:399: PackageKit: Woops: Signal.Path: /8548_bcaadbce
21:16:16 pkg.go:335: Pkg[autogroup:(cowsay,powertop,sl)]: Set: installed success!
21:16:16 main.go:79: Converged for 0 seconds, exiting!
21:16:16 main.go:55: Interrupted by exit signal
21:16:16 pgraph.go:796: Pkg[cowsay]: Exited
21:16:16 main.go:203: Goodbye!

real    0m15.621s
user    0m0.040s
sys    0m0.038s

Results and analysis

My hard work seems to have paid off, because we do indeed see a noticeable improvement from grouping package resources. The data shows that even in the single package comparison cases, mgmt has very little overhead, which is demonstrated by seeing that the mgmt run times are very similar to the times it takes to run the package managers manually.

In the three package scenario, performance is approximately 2.39 times faster than puppet for installation. Checking was about 12 times faster! These ratios are expected to grow with a larger number of resources.

Sweet graph...

Bigger bars is worse… Puppet is in Red, mgmt is in blue.

The four groups at the bottom along the x axis correspond to the four scenarios I tested, 1, 2 and 3 corresponding to each run of that scenario, with the average of the three listed there too.

Versions

The test wouldn’t be complete if we didn’t tell you which specific version of each tool that we used. Let’s time those as well! ;)

puppet:

$ time puppet --version 
4.2.1

real    0m0.659s
user    0m0.525s
sys    0m0.064s

mgmt:

$ time ./mgmt --version
mgmt version 0.0.3-1-g6f3ac4b

real    0m0.007s
user    0m0.006s
sys    0m0.002s

pkcon:

$ time pkcon --version
1.0.11

real    0m0.013s
user    0m0.006s
sys    0m0.005s

dnf:

$ time dnf --version
1.1.7
  Installed: dnf-0:1.1.7-2.fc23.noarch at 2016-03-17 13:37
  Built    : Fedora Project at 2016-03-09 16:45

  Installed: rpm-0:4.13.0-0.rc1.12.fc23.x86_64 at 2016-03-03 09:39
  Built    : Fedora Project at 2016-02-29 09:53

real    0m0.438s
user    0m0.379s
sys    0m0.036s

Yep, puppet even takes the longest to tell us what version it is. Now I’m just teasing…

Methodology

It might have been more useful to time the removal of packages instead so that we further reduce the variability of internet bandwidth and latency, although since most configuration management is used to install packages (rather than remove), we figured this would be more appropriate and easy to understand. You’re welcome to conduct your own study and share the results!

Additionally, for fun, I also looked at puppet runs where three individual resources were used instead of a single resource with the title being an array of all three packages, and found no significant difference in the results. Indeed puppet runs dnf three separate times in either scenario:

$ ps auxww | grep dnf
root     12118 27.0  1.4 417060 110864 ?       Ds   21:57   0:03 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install powertop
$ ps auxww | grep dnf
root     12713 32.7  2.0 475204 159840 ?       Rs   21:57   0:02 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install sl
$ ps auxww | grep dnf
root     13126  0.0  0.7 275324 55608 ?        Rs   21:57   0:00 /usr/bin/python3 /usr/bin/dnf -d 0 -e 0 -y install cowsay

Data

If you’d like to download the raw data as a text formatted table, and the terminal output from each type of run, I’ve posted it here.

Conclusion

I hope that you enjoyed this feature and analysis, and that you’ll help contribute to making it better. Come join our IRC channel and say hello! Thanks to those who reviewed my article and pointed out some good places for improvements!

Happy Hacking,

James

 

Introducing: git tpush

On today’s issue of “one hour hacks”, I’ll show you how you can stop your git drive-by’s to git master from breaking your CI tests… Let’s continue!

The problem:

Sometimes I’ve got a shitty one-line patch that I want to push to git master. I’m usually right, and everything tests out fine, but usually isn’t always, and then I look silly while I frantically try to fix git master on a project that I maintain. Let’s use tools to hide our human flaws!

How you’re really supposed to do it:

Good hackers know that you’re supposed to:

  • checkout a new feature branch for your patch
    • checkout -b feat/foo
  • write the patch and commit it
    • git add -p && git commit -m 'my foo'
  • push that branch up to origin
    • git push origin feat/foo
  • wait for the tests to succeed
    • hub ci-status && sleep 1m && ...
  • merge and push to git master
    • git checkout master && git merge feat/foo && git push
  • delete your local branch
    • git branch -d feat/foo
  • delete the remote branch
    • git push origin :feat/foo
  • mourn all the lost time it took to push this patch
    • cat && drink && sleep 8h

If it happens to fail, you have to remember what the safe git reset command is, and then you have to start all over!

How do we make this easy?

I write tools, so I sat down and wrote a cheap little bash script to do it for me! Sharing is caring, so please enjoy a copy!

How does it work?

It actually just does all of the above for you automatically, and it automatically cleans up all of the temporary branches when it’s done! Here are two example runs to help show you the tool in action:

A failure scenario:

Here I add a patch with some trailing whitespace, which will easily get caught by the automatic test suite. You’ll note that I actually had to force the commit locally with git commit -n, because the pre-commit hook actually caught this first. You could extend this script to run your test suite locally before you push the patch, but that defeats the idea of a drive-by.

james@computer:~/code/mgmt$ git add -p
diff --git a/README.md b/README.md
index 277aa73..2c31356 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,7 @@ Please get involved by working on one of these items or by suggesting something
 Please grab one of the straightforward [#mgmtlove](https://github.com/purpleidea/mgmt/labels/mgmtlove) issues if you're a first time contributor.
 
 ## Bugs:
+Some bugs are caused by bad drive-by patches to git master! 
 Please set the `DEBUG` constant in [main.go](https://github.com/purpleidea/mgmt/blob/master/main.go) to `true`, and post the logs when you report the [issue](https://github.com/purpleidea/mgmt/issues).
 Bonus points if you provide a [shell](https://github.com/purpleidea/mgmt/tree/master/test/shell) or [OMV](https://github.com/purpleidea/mgmt/tree/master/test/omv) reproducible test case.
 
Stage this hunk [y,n,q,a,d,/,e,?]? y
<stdin>:9: trailing whitespace.
Some bugs are caused by bad drive-by patches to git master! 
warning: 1 line adds whitespace errors.

james@computer:~/code/mgmt$ git commit -m 'XXX this is a bad patch'
README.md:37: trailing whitespace.
+Some bugs are caused by bad drive-by patches to git master! 
james@computer:~/code/mgmt$ git commit -nm 'XXX this is a bad patch'
[master 8b29fb3] XXX this is a bad patch
 1 file changed, 1 insertion(+)
james@computer:~/code/mgmt$ git tpush 
tpush branch is: tpush/test-0
Switched to a new branch 'tpush/test-0'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 357 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
 * [new branch]      tpush/test-0 -> tpush/test-0
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
- CI is starting...
/ Waiting for CI...
- Waiting for CI...
\ Waiting for CI...
| Waiting for CI...
Deleted branch tpush/test-0 (was 8b29fb3).
To git@github.com:purpleidea/mgmt.git
 - [deleted]         tpush/test-0
Upstream CI failed with:
failure: https://travis-ci.org/purpleidea/mgmt/builds/109472293
Fix branch 'master' with:
git commit --amend
or:
git reset --soft HEAD^
Happy hacking!

A successful scenario:

In this example, I amended the previous patch in my local copy of git master (which never got pushed to the public git master!) and then I ran git tpush again.

james@computer:~/code/mgmt$ git commit --amend
[master 1186d63] Add a link to my article about debugging golang
 Date: Mon Feb 15 17:28:01 2016 -0500
 1 file changed, 1 insertion(+)
james@computer:~/code/mgmt$ git tpush 
tpush branch is: tpush/test-0
Switched to a new branch 'tpush/test-0'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 397 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
 * [new branch]      tpush/test-0 -> tpush/test-0
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
- CI is starting...
/ Waiting for CI...
- Waiting for CI...
\ Waiting for CI...
| Waiting for CI...
Deleted branch tpush/test-0 (was 1186d63).
To git@github.com:purpleidea/mgmt.git
 - [deleted]         tpush/test-0
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 397 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
   6e68d6d..1186d63  master -> master
Done!

Please take note of the “Waiting for CI…” text. This is actually a spinner which updates itself. Run this tool yourself to get the full effect! This avoids having 200 line scrollback in your terminal while you wait for your CI to complete.

Lastly, you’ll note that I run git tpush instead of git-tpush. This is because I added an alias to my ~/.gitconfig. It’s quite easy, just add:

tpush = !git-tpush

under the [alias] section, and ensure the git-tpush shell script is in your ~/bin/ folder and is executable.

Pre-requisites:

This actually depends on the hub command line tool which knows how to ask our CI about the test status. It sucks using a distributed tool like git with a centralized thing like github, but it’s still pretty easy for me to move away if they go bad. You’ll also want the tput installed if you want the fancy spinner, but that comes with most GNU/Linux distros by default.

The code:

Grab a copy here!

Happy hacking,

James

PS: No, I couldn’t think of a better name than git-tpush, if you don’t like it feel free to rename it yourself!