Declarative vs. Imperative paradigms

Recently, while operating two different remote-controlled appliances, I realized that it was high time for a discussion about declarative and imperative paradigms. Let’s start by looking at the two remotes:

declarative-imperative

Two different “remotes”. The one on the left operates a television, and the one on the right controls a central heating and cooling system.

At first glance you will notice that one of these remotes is dark, and the other is light. You might also notice that my photography skills are terrible. Neither of these facts is very important to the discussion at hand. Is there anything interesting that you can infer?

Here’s a hint: to both reduce costs, preserve battery life, and to keep the design simple, the manufacturers of these two devices only included infrared transmitters on the two devices. That is to say, there is no infrared receiver built into the remote control units, those are only found on the appliances.

The difference is that the dark coloured television remote control sends commands imperatively, whilst the light coloured climate control system sends them declaratively. What does this mean?

Imperative:

When you press the “increase volume” button on the dark-coloured remote, it sends the equivalent of a literal “increase the volume” command to the television. It has no idea what the previous volume was, or even if the television is powered on.

It is fine to operate this way because the human will discern that an action occurred by an on-screen pop up which informs them about the new volume. If the remote wasn’t properly aimed at the television, then the lack of a feedback loop will probably convince you to repeat the command.

This is not idempotent, and there is always the chance that an erroneous extra command could be sent. (Perhaps the television had a GC pause and didn’t display a notification till some seconds after you had retried the command.) It is not particularly safe, but since we’re changing channels and not launching missiles, it’s not a significant problem.

Declarative:

When you press the “increase temperature” button on the light-coloured remote, it actually sends the entire state as shown on the display including the equivalent of “22oC please”. It is able to remember what it believes to be the previous state by storing this on-board, and it even displays it on the built-in LCD screen.

It is useful to operate in this manner, because the designers decided that it was preferable to view a display which was positioned in your hand, rather than on a fan unit which might be mounted in an awkward area and which could exacerbate neck strain to view.

In this case to provide feedback that the command was actually received, each key press will trigger a “beep” noise from the wall unit, and a brief flash of its power LED. This is idempotent, although the designers didn’t provide a button to “resend state”.

Advantages and disadvantages:

Clearly, it is quite straightforward to imperatively send an action command. This is the common paradigm which we’re usually most comfortable with, and layered beneath a declarative design there usually are some imperative operations. This often works best for low latency operations where feedback is guaranteed.

Anyone who has operated a recent television might remember that they require a few seconds to “start-up”. In my impatience when trying to turn on a television, I’ve pressed the “power button” a second time only to see the television turn itself back off. This is because that button only knows to send a “change power state” command which must be both sent and received in odd quanta to change the (binary) power state.

In the declarative world we always request the desired result, and allow the machine to figure out the mechanism and if an action even needs to be performed at all. (It’s worth mentioning that I could always send a subset of the desired state if sending the full “struct” would be too onerous!) I can however trick the light-coloured remote into displaying an incorrect visual by pressing buttons when it is out of transmitting range of the wall unit. The good news is that this is only due to a lack of bi-directional communication, and isn’t a failure of the declarative paradigm.

Bi-directional communication:

In practice after either an imperative or declarative command is sent, we can usually expect to receive a response, whether it be a 200 OK, or an error message. We’re talking about software and systems engineering now, and while TCP makes everyone’s lives easier, it’s not immune from failure. If the successful ACK message to an imperative command never got delivered (maybe the server crashed after having sent it) and the client repeats the request, you’ve got a possible problem on your hands!

With the declarative message, it’s safe to retry your command ad-nauseum until you’re satisfied you were heard.

Multiple senders:

With a declarative approach it might seem that if two near-simultaneous users send differing commands, the “winner” would overwrite the other users command! It might also seem that this isn’t a problem in imperative systems. It turns out that both these assumptions are wrong.

In the imperative system two independent senders might both decide that the volume wasn’t loud enough. If they were to both send a command, they might now have increased it twice as much as desired and are in a similar situation but with the problem that their ears now hurt.

In the declarative system it is easy to avoid overwriting another users request with a simple CAS system (basically test and set). For any state change command, the user first looks up the current state and gets a unique integer (possibly incremental) that represents the returned state. When the new state is sent, it should include the integer from the previous state, and should know to only process the state change if it currently matches this value. In this manner with simultaneous requests, as long as the endpoint linearizes the requests, only one will succeed, while the others will know that their cached view of the world needs to be refreshed. This might annoy them, but it is the safer mechanism.

Multiple endpoints:

If multiple endpoints form a consistent cluster where all members can serve client requests using a declarative tasking system, the client can send the request to one system, and retrieve the status from someone else. This is particularly useful if the operation might take some time, and there is a chance that the initial endpoint was replaced during this interval.

declarative-imperative-diagram

Hand drawn diagram of the scenario. As you can see the user is happy because they are using a declarative paradigm, and are more resistant to failures.

While this could be replicated to some degree with imperative systems, it’s much more complicated to keep track of whether an operation occurred or not as it’s easier for each cluster member to be able to determine whether the current state matches what’s expected.

Conclusion:

This wasn’t an exhaustive discussion about the topic, but I couldn’t help writing about it since the two paradigms present some interesting engineering questions. As systems become more complex, will the adoption of new methodologies be widespread and expedient, or niche and sluggish?

Some of my declarative work involves my next generation configuration management project! Get involved today!

Happy Hacking,

James

Faster golang builds

I’ve been hacking in golang since before version 1.4, and the speed at which my builds finished has been mostly trending downwards. Let’s look into the reasons and some fixes. TL;DR click-bait title: “Get 4x faster golang builds with this one trick!”.

Here are the three reasons my builds got slower:

The compiler

Before version 1.5, the compiler was written in C but with that release, it moved to being pure golang. This unfortunately reduced build performance quite measurably, even though it was the right decision for the project.

There have been slight improvements with newer versions, however Google’s focus has been improving runtime performance instead of build performance. This is understandable since they want to save lots of electricity dollars at scale, which is not as helpful for smaller shops where the developer iteration cycle is the metric to optimize.

This could still be improved if folks wanted to put in the effort. A gcc style -O0 option could help. The sad thing about this whole story is that “instant” builds were a major marketing feature of early golang presentations.

Project size

Over time, my main project (mgmt) has gotten much bigger! It compiles in a number of libraries, including etcd, prometheus, and more! This naturally increases build time and is a mostly unavoidable consequence of building cool things and standing on giants!

This mostly can’t be helped, but it can be mitigated…

Dependency caching

When you build a project and all of its dependencies, the unchanged dependencies shouldn’t need to be rebuilt! Unfortunately golang does a great job of silently rebuilding things unnecessarily. Here’s why…

When you run a build, golang will attempt to re-use any common artefacts from previous builds. If the golang versions or library versions don’t match, these won’t be used, and the compiler will redo this work. Unfortunately, by default, those results won’t be saved, causing you to waste CPU cycles every time you test!

When the intermediate results are kept, they are found in your $GOPATH/pkg/. To save them, you need to either run go install (which makes a mess in your $GOPATH/bin/, or you can run go build -i. (Thanks to Dave for the tip!)

The sad part of this story is that these aren’t cached by default, and stale results aren’t discarded by default! If you’re experiencing slow builds, you should rm -rf $GOPATH/pkg/ and then go build -i. After one successful build, future builds should be much faster!

Example

james@computer:~/code/mgmt$ time go build    # before

real    0m28.152s
user    1m17.097s
sys     0m5.235s

james@computer:~/code/mgmt$ time go build -i    # after

real    0m8.129s
user    0m12.014s
sys     0m0.839s

Debugging

If you want to debug what’s going on, you can always run go build -x.

Blame!

I don’t like assigning blame, but this feels like a case of the golang tools being obtuse, and the man pages being non-existent. The golang project has a lot of maturing to do to integrate sanely with a stock GNU environment:

  • build intermediates could be saved and discarded by default
  • man go build could exist and provide useful information
  • go build --help go help build could provide more useful information
  • POSIX style flags could be used (eg: --help)
  • build cache could be stored in $XDG_CACHE_HOME.

Hope this helped improve your golang experience! I always knew something was going in in $GOPATH/pkg/, but I think it’s pretty absurd that I only fully understood it now. My builds are about 4x faster now. :)

Happy Hacking,

James

Ten minute hacks: Process pause & resume

I’m old school and still rocking an old X220 laptop because I didn’t like the new ones. My battery life isn’t as great as I’d like it to be, but it gets worse when some “webapp” (which I’d much rather have as a native GTK+ app) causes Firefox to rev my CPU with their websocket (hi gmail!) poller.

This seems to happen most often on planes or when I’m disconnected from the internet. Since it’s difficult to know which tab is the offending one, and since I might want to keep that tabs state anyway, I decided to write a little shell script to pause and resume misbehaving processes.

After putting it into my ~/bin/ and running chmod u+x ~/bin/pause-continue.sh on it, you can now:

james@computer:~$ pause-continue.sh firefox
Stopping 'firefox'...
[press any key to continue]
Continuing 'firefox'...
james@computer:~$ echo $?        # error codes work iirc
0
james@computer:~$

The code is trivially simple, with an added curses hack to make this 13% more fun. It sends a SIGSTOP signal initially, and then when you press a key it resumes it with SIGCONT. Here it is the code.

You should obviously substitute in the name of the process that you’d like to pause and resume. If your process breaks because it didn’t deal well with the signals, then you get to keep both pieces!

This should help me on my upcoming travel! I’ll be presenting some of my mgmtconfig work at DevConf.cz, FOSDEM and CfgMgmtCamp.eu! CfgMgmtCamp will also have a short mgmt track (looking forward to seeing Felix present!) and we’ll be around to hack on the 8th during fringe (the day after the official camp) if you’d like help to get your patch merged! I’m looking forward to it!

Happy hacking!

James

Send/Recv in mgmt

I previously published “A revisionist history of configuration management“. I meant for that to be the intro to this article, but it ended up being long enough that it deserved a separate post. I will explain Send/Recv in this article, but first a few clarifications to the aforementioned article.

Clarifications

I mentioned that my “revisionist history” was inaccurate, but I failed to mention that it was also not exhaustive! Many things were left out either because they were proprietary, niche, not well-known, of obscure design or simply for brevity. My apologies if you were involved with Bcfg2, Bosh, Heat, Military specifications, SaltStack, SmartFrog, or something else entirely. I’d love it if someone else wrote an “exhaustive history”, but I don’t think that’s possible.

It’s also worth re-iterating that without the large variety of software and designs which came before me, I wouldn’t have learned or have been able to build anything of value. Thank you giants!  By discussing the problems and designs of other tools, then it makes it easier to contrast with and explaining what I’m doing in mgmt.

Notifications

If you’re not familiar with the directed acyclic graph model for configuration management, you should start by reviewing that material first. It models a system of resources (workers) as the vertices in that DAG, and the edges as the dependencies. We’re going to add some additional mechanics to this model.

There is a concept in mgmt called notifications. Any time the state of a resource is successfully changed by the engine, a notification is emitted. These notifications are emitted along any graph edge (dependency) that has been asked to relay them. Edges with the Notify property will do so. These are usually called refresh notifications.

Any time a resource receives a refresh notification, it can apply a special action which is resource specific. The svc resource reloads the service, the password resource generates a new password, the timer resource resets the timer, and the noop resource prints a notification message. In general, refresh notifications should be avoided if possible, but there are a number of legitimate use cases.

In mgmt notifications are designed to be crash-proof, that is to say, undelivered notifications are re-queued when the engine restarts. While we don’t expect mgmt to crash, this is also useful when a graph is stopped by the user before it has completed.

You’ll see these notifications in action momentarily.

Send/Recv

I mentioned in the revisionist history that I felt that Chef opted for raw code as a solution to the lack of power in Puppet. Having resources in mgmt which are event-driven is one example of increasing their power. Send/Recv is another mechanism to make the resource primitive more powerful.

Simply put: Send/Recv is a mechanism where resources can transfer data along graph edges.

The status quo

Consider the following pattern (expressed as Puppet code):

# create a directory
file { '/var/foo/':
    ensure => directory,
}
# download a file into that directory
exec { 'wget http://example.com/data -O - > /var/foo/data':
    creates => '/var/foo/data',
    require => File['/var/foo/'],
}
# set some property of the file
file { '/var/foo/data':
    mode => 0644,
    require => File['/var/foo/data'],
}

First a disclaimer. Puppet now actually supports an http url as a source. Nevertheless, this was a common pattern for many years and that solution only improves a narrow use case. Here are some of the past and current problems:

  • File can’t take output from an exec (or other) resource
  • File can’t pull from an unsupported protocol (sftp, tftp, imap, etc…)
  • File can get created with zero-length data
  • Exec won’t update if http endpoint changes the data
  • Requires knowledge of bash and shell glue
  • Potentially error-prone if a typo is made

There’s also a layering violation if you believe that network code (http downloading) shouldn’t be in a file resource. I think it adds unnecessary complexity to the file resource.

The solution

What the file resource actually needs, is to be able to accept (Recv) data of the same type as any of its input arguments. We also need resources which can produce (Send) data that is useful to consumers. This occurs along a graph (dependency) edge, since the sending resource would need to produce it before the receiver could act!

This also opens up a range of possibilities for new resource kinds that are clever about sending or receiving data. an http resource could contain all the necessary network code, and replace our use of the exec { 'wget ...': } pattern.

Diagram

in this graph, a password resource generates a random string and stores it in a file

in this graph, a password resource generates a random string and stores it in a file; more clever linkages are planned

Example

As a proof of concept for the idea, I implemented a Password resource. This is a prototype resource that generates a random string of characters. To use the output, it has to be linked via Send/Recv to something that can accept a string. The file resource is one such possibility. Here’s an excerpt of some example output from a simple graph:

03:06:13 password.go:295: Password[password1]: Generating new password...
03:06:13 password.go:312: Password[password1]: Writing password token...
03:06:13 sendrecv.go:184: SendRecv: Password[password1].Password -> File[file1].Content
03:06:13 file.go:651: contentCheckApply: Invalidating sha256sum of `Content`
03:06:13 file.go:579: File[file1]: contentCheckApply(true)
03:06:13 noop.go:115: Noop[noop1]: Received a notification!

What you can see is that initially, a random password is generated. Next Send/Recv transfers the generated Password to the file’s Content. The file resource invalidates the cached Content checksum (a performance feature of the file resource), and then stores that value in the file. (This would normally be a security problem, but this is for example purposes!) Lastly, the file sends out a refresh notification to a Noop resource for demonstration purposes. It responds by printing a log message to that effect.

Libmgmt

Ultimately, mgmt will have a DSL to express the different graphs of configuration. In the meantime, you can use Puppet code, or a raw YAML file. The latter is primarily meant for testing purposes until we have the language built.

Lastly, you can also embed mgmt and use it like a library! This lets you write raw golang code to build your resource graphs. I decided to write the above example that way! Have a look at the code! This can be used to embed mgmt into your existing software! There are a few more examples available here.

Resource internals

When a resource receives new values via Send/Recv, it’s likely that the resource will have work to do. As a result, the engine will automatically mark the resource state as dirty and then poke it from the sending node. When the receiver resource runs, it can lookup the list of keys that have been sent. This is useful if it wants to perform a cache invalidation for example. In the resource, the code is quite simple:

if val, exists := obj.Recv["Content"]; exists && val.Changed {
    // the "Content" input has changed
}

Here is a good example of that mechanism in action.

Future work

This is only powerful if there are interesting resources to link together. Please contribute some ideas, and help build these resources! I’ve got a number of ideas already, but I’d love to hear yours first so that I don’t influence or stifle your creativity. Leave me a message in the comments below!

Happy Hacking,

James

Remote execution in mgmt

Bootstrapping a cluster from your laptop, or managing machines without needing to first setup a separate config management infrastructure are both very reasonable and fundamental asks. I was particularly inspired by Ansible‘s agent-less remote execution model, but never wanted to build a centralized orchestrator. I soon realized that I could have my ice cream and eat it too.

Prior knowledge

If you haven’t read the earlier articles about mgmt, then I recommend you start with those, and then come back here. The first and fourth are essential if you’re going to make sense of this article.

Limitations of existing orchestrators

Current orchestrators have a few limitations.

  1. They can be a single point of failure
  2. They can have scaling issues
  3. They can’t respond instantaneously to node state changes (they poll)
  4. They can’t usually redistribute remote node run-time data between nodes

Despite these limitations, orchestration is still very useful because of the facilities it provides. Since these facilities are essential in a next generation design, I set about integrating these features, but with a novel twist.

Implementation, Usage and Design

Mgmt is written in golang, and that decision was no accident. One benefit is that it simplifies our remote execution model.

To use this mode you run mgmt with the --remote flag. Each use of the --remote argument points to a different remote graph to execute. Eventually this will be integrated with the DSL, but this plumbing is exposed for early adopters to play around with.

Startup (part one)

Each invocation of --remote causes mgmt to remotely connect over SSH to the target hosts. This happens in parallel, and runs up to --cconns simultaneous connections.

A temporary directory is made on the remote host, and the mgmt binary and graph are copied across the wire. Since mgmt compiles down to a single statically compiled binary, it simplifies the transfer of the software. The binary is cached remotely to speed up future runs unless you pass the --no-caching option.

A TCP connection is tunnelled back over SSH to the originating hosts etcd server which is embedded and running inside of the initiating mgmt binary.

Execution (part two)

The remote mgmt binary is now run! It wires itself up through the SSH tunnel so that its internal etcd client can connect to the etcd server on the initiating host. This is particularly powerful because remote hosts can now participate in resource exchanges as if they were part of a regular etcd backed mgmt cluster! They don’t connect directly to each other, but they can share runtime data, and only need an incoming SSH port open!

Closure (part three)

At this point mgmt can either keep running continuously or it can close the connections and shutdown.

In the former case, you can either remain attached over SSH, or you can disconnect from the child hosts and let this new cluster take on a new life and operate independently of the initiator.

In the latter case you can either shutdown at the operators request (via a ^C on the initiator) or when the cluster has simultaneously converged for a number of seconds.

This second possibility occurs when you run mgmt with the familiar --converged-timeout parameter. It is indeed clever enough to also work in this distributed fashion.

Diagram

I’ve used by poor libreoffice draw skills to make a diagram. Hopefully this helps out my visual readers.

remote-execution

If you can improve this diagram, please let me know!

Example

I find that using one or more vagrant virtual machines for the remote endpoints is the best way to test this out. In my case I use Oh-My-Vagrant to set up these machines, but the method you use is entirely up to you! Here’s a sample remote execution. Please note that I have omitted a number of lines for brevity, and added emphasis to the more interesting ones.

james@hostname:~/code/mgmt$ ./mgmt run --remote examples/remote2a.yaml --remote examples/remote2b.yaml --tmp-prefix 
17:58:22 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
17:58:23 remote.go:596: Remote: Connect...
17:58:23 remote.go:607: Remote: Sftp...
17:58:23 remote.go:164: Remote: Self executable is: /home/james/code/gopath/src/github.com/purpleidea/mgmt/mgmt
17:58:23 remote.go:221: Remote: Remotely created: /tmp/mgmt-412078160/remote
17:58:23 remote.go:226: Remote: Remote path is: /tmp/mgmt-412078160/remote/mgmt
17:58:23 remote.go:221: Remote: Remotely created: /tmp/mgmt-412078160/remote
17:58:23 remote.go:226: Remote: Remote path is: /tmp/mgmt-412078160/remote/mgmt
17:58:23 remote.go:235: Remote: Copying binary, please be patient...
17:58:23 remote.go:235: Remote: Copying binary, please be patient...
17:58:24 remote.go:256: Remote: Copying graph definition...
17:58:24 remote.go:618: Remote: Tunnelling...
17:58:24 remote.go:630: Remote: Exec...
17:58:24 remote.go:510: Remote: Running: /tmp/mgmt-412078160/remote/mgmt run --hostname '192.168.121.201' --no-server --seeds 'http://127.0.0.1:2379' --file '/tmp/mgmt-412078160/remote/remote2a.yaml' --depth 1
17:58:24 etcd.go:2088: Etcd: Watch: Path: /_mgmt/exported/
17:58:24 main.go:255: Main: Waiting...
17:58:24 remote.go:256: Remote: Copying graph definition...
17:58:24 remote.go:618: Remote: Tunnelling...
17:58:24 remote.go:630: Remote: Exec...
17:58:24 remote.go:510: Remote: Running: /tmp/mgmt-412078160/remote/mgmt run --hostname '192.168.121.202' --no-server --seeds 'http://127.0.0.1:2379' --file '/tmp/mgmt-412078160/remote/remote2b.yaml' --depth 1
17:58:24 etcd.go:2088: Etcd: Watch: Path: /_mgmt/exported/
17:58:24 main.go:291: Config: Parse failure
17:58:24 main.go:255: Main: Waiting...
^C17:58:48 main.go:62: Interrupted by ^C
17:58:48 main.go:397: Destroy...
17:58:48 remote.go:532: Remote: Output...
|    17:58:23 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
|    17:58:47 main.go:419: Goodbye!
17:58:48 remote.go:636: Remote: Done!
17:58:48 remote.go:532: Remote: Output...
|    17:58:24 main.go:76: This is: mgmt, version: 0.0.5-3-g4b8ad3a
|    17:58:48 main.go:419: Goodbye!
17:58:48 remote.go:636: Remote: Done!
17:58:48 main.go:419: Goodbye!

You should see that we kick off the remote executions, and how they are wired back through the tunnel. In this particular case we terminated the runs with a ^C.

The example configurations I used are available here and here. If you had a terminal open on the first remote machine, after about a second you would have seen:

[root@omv1 ~]# ls -d /tmp/file*  /tmp/mgmt*
/tmp/file1a  /tmp/file2a  /tmp/file2b  /tmp/mgmt-412078160
[root@omv1 ~]# cat /tmp/file*
i am file1a
i am file2a, exported from host a
i am file2b, exported from host b

You can see the remote execution artifacts, and that there was clearly data exchange. You can repeat this example with --converged-timeout=5 to automatically terminate after five seconds of cluster wide inactivity.

Live remote hacking

Since mgmt is event based, and graph structure configurations manifest themselves as event streams, you can actually edit the input configuration on the initiating machine, and as soon as the file is saved, it will instantly remotely propagate and apply the graph differential.

For this particular example, since we export and collect resources through the tunnelled SSH connections, it means editing the exported file, will also cause both hosts to update that file on disk!

You’ll see this occurring with this message in the logs:

18:00:44 remote.go:973: Remote: Copied over new graph definition: examples/remote2b.yaml

While you might not necessarily want to use this functionality on a production machine, it will definitely make your interactive hacking sessions more useful, in particular because you never need to re-run parts of the graph which have already converged!

Auth

In case you’re wondering, mgmt can look in your ~/.ssh/ for keys to use for the auth, or it can prompt you interactively. It can also read a plain text password from the connection string, but this isn’t a recommended security practice.

Hierarchial remote execution

Even though we recommend running mgmt in a normal clustered mode instead of over SSH, we didn’t want to limit the number of hosts that can be configured using remote execution. For this reason it would be architecturally simple to add support for what we’ve decided to call “hierarchial remote execution”.

In this mode, the primary initiator would first connect to one or more secondary nodes, which would then stage a second series of remote execution runs resulting in an order of depth equal to two or more. This fan out approach can be used to distribute the number of outgoing connections across more intermediate machines, or as a method to conserve remote execution bandwidth on the primary link into your datacenter, by having the secondary machine run most of the remote execution runs.

remote-execution2

This particular extension hasn’t been built, although some of the plumbing has been laid. If you’d like to contribute this feature to the upstream project, please join us in #mgmtconfig on Freenode and let us (I’m @purpleidea) know!

Docs

There is some generated documentation for the mgmt remote package available. There is also the beginning of some additional documentation in the markdown docs. You can help contribute to either of these by sending us a patch!

Novel resources

Our event based architecture can enable some previously improbable kinds of resources. In particular, I think it would be quite beautiful if someone built a provisioning resource. The Watch method of the resource API normally serves to notify us of events, but since it is a main loop that blocks in a select call, it could also be used to run a small server that hosts a kickstart file and associated TFTP images. If you like this idea, please help us build it!

Conclusion

I hope you enjoyed this article and found this remote execution methodology as novel as we do. In particular I hope that I’ve demonstrated that configuration software doesn’t have to be constrained behind a static orchestration topology.

Happy Hacking,

James

mgmt has a logo

The mgmt config project got a logo! The full commit is here. Thanks to Sarah Jane Cox for creating it.

mgmt

Happy Hacking,

James

PS: I might have a few stickers to give out too! Ask me next time you see me if you’d like one! Alternatively, use the artwork to make your own and share with your friends!

Live dmesg following

All good sysadmins eventually learn about using tail -F to tail files. Yes upper-case F is superior.

Around the time I wrote that article, I remember wanting to stream dmesg output too! The functionality wasn’t available without some sort of polling hack, but it turns out that kernel support for this actually landed around the same time in version 3.5.0!

Most GNU/Linux distros are probably running a new enough version by now, and you can now dmesg --follow (or dmesg -w):

$ dmesg -w
[1042958.877980] restoring control 00000000-0000-0000-0000-000000000101/10/5
[1042959.254826] usb 1-1.2: reset low-speed USB device number 3 using ehci-pci
[1042959.356847] psmouse serio1: synaptics: queried max coordinates: x [..5472], y [..4448]
[1042959.530884] PM: resume of devices complete after 976.885 msecs
[1042959.531457] PM: Finishing wakeup.
[1042959.531460] Restarting tasks ... done.
[1042959.622234] video LNXVIDEO:00: Restoring backlight state
[1042959.767952] e1000e: enp0s25 NIC Link is Down
[1042959.771333] IPv6: ADDRCONF(NETDEV_UP): enp0s25: link is not ready
[1048528.391506] All your base are belong to us.

As an added bonus, you can access this via journalctl --dmesg --follow too:

$ journalctl -kf
[snip]
Aug 28 19:58:13 hostname unknown: All your base are belong to us.
Now we have a dmesg version too! https://www.youtube.com/watch?v=8fvTxv46ano&html5=1

Now we have a dmesg version too!

Since my dmesg output wasn’t very noisy when writing this article, and since I didn’t write an “all your base” kernel module, you can actually test this functionality by writing to the kernel ring buffer:

$ sudo bash -c 'echo The Technical Blog of James is awesome! > /dev/kmsg'

Happy hacking!

James

PS: Since this is a facility that provides events, we could eventually write an mgmt config “fact” or resource around it!

Ten minute hacks: Hacking airplane headphones

I was stuck on a 14 hour flight last week, and to my disappointment, only one of the two headphone speakers were working. The plane’s media centre has an audio connector that looks like this:

airplane-headphones-jack

Someone should consider probing this USB port.

The hole to the left is smaller than a 3.5mm headphone jack, and designed for a proprietary headphone connector that I didn’t have, and the two holes to the right are part of a different proprietary connector which match with the cheap airline headphones to provide the left and right audio channels.

airplane-headphones-connected

Completely reversible, and therefore completely ambiguous. Stereo is so 1880’s anyways.

By reversing the connector, I was quickly able to determine that the headphones were not faulty, because this swapped the missing audio channel to the other ear. It’s also immediately obvious that since there are no left vs. right polarity markings on either the receptacle or the headphones, there’s a 50% chance that you’ll get reverse stereo.

With the fault identified, and lots of time to kill, I decided to try to hack a workaround. I borrowed some tweezers from a nearby passenger, and slowly ripped off some of the exterior plastic to expose the signal wires. To my surprise there were actually four wires, instead of three using a shared ground.

airplane-headphones-separated

Headphone wires stripped, exposed and ready for splicing.

With a bit of care this only took about five minutes. The next step was to “patch” the working positive and ground wires from the working channel, into the speaker from the broken channel. I did this by trial and error using a bit of intuition to try to keep both speakers in phase.

airplane-headphones-spliced

After a twist splice and using paper as an insulator.

A small scrap of paper acted as an insulator to prevent short circuits between the positive and negative wires. Lastly, a figure eight on a bight was tied to isolate the weak splice from any tension, thus preventing damage and disconnects.

airplane-headphones-knotted

All wrapped up neatly and tied with a knot.

The finished product worked beautifully, despite now only providing monaural audio and is about five centimetres shorter, which is still perfectly usable since the seats hardly recline. The flight staff weren’t angry that I had cannibalized their headphones, but also didn’t understand how my contraption was able to solve the problem.

This fun little ten minute hack helped provide some distraction in economy class, and maybe it will be useful to you since I doubt they’ve repaired the media system in the seat! If you work for Emirates, let me know and I’ll give you the seat and flight number.

Happy hacking!

James

Automatic clustering in mgmt

In mgmt, deploying and managing your clustered config management infrastructure needs to be as automatic as the infrastructure you’re using mgmt to manage. With mgmt, instead of a centralized data store, we function as a distributed system, built on top of etcd and the raft protocol.

In this article, I’ll cover how this feature works.

Foreword:

Mgmt is a next generation configuration management project. If you haven’t heard of it yet, or you don’t remember why we use a distributed database, start by reading the previous articles:

Embedded etcd:

Since mgmt and etcd are both written in golang, the etcd code can be built into the same binary as mgmt. As a result, etcd can be managed directly from within mgmt. Unfortunately, there’s currently no recommended API to do this, but I’ve tried to get such a feature upstream to avoid code duplication in mgmt. If you can help out here, I’d really appreciate it! In the meantime, I’ve had to copy+paste the necessary portions into mgmt.

Clustering mechanics:

You can deploy an automatically clustered mgmt cluster by following these three steps:

1) If no mgmt servers exist you can start one up by running mgmt normally:

./mgmt run --file examples/graph0.yaml

2) To add any subsequent mgmt server, run mgmt normally, but point it at any number of existing mgmt servers with the --seeds command:

./mgmt run --file examples/graph0.yaml --seeds <ip address:port>

3) Profit!

We internally implement a clustering algorithm which does the hard-working of building and managing the etcd cluster for you, so that you don’t have to. If you’re interested, keep reading to find out how it works!

Clustering algorithm:

The clustering algorithm works as follows:

If you aren’t given any seeds, then assume you are the first etcd server (peer) and start-up. If you are given a seeds argument, then connect to that peer to join the cluster as a client. If you’d like to be promoted to a server, then you can “volunteer” by setting a special key in the cluster data store.

The existing cluster of peers will decide if they want additional peers, and if so, they can “nominate” someone from the pool of volunteers. If you have been nominated, you can start-up an etcd peer and peer with the rest of the cluster. Similarly, the cluster can decide to un-nominate a peer, and if you’ve been un-nominated, then you should shutdown your etcd server.

All cluster decisions are made by consensus using the raft algorithm. In practice this means that the elected cluster leader looks at the state of the system, and makes the necessary nomination changes.

Lastly, if you don’t want to be a peer any more, you can revoke your volunteer message, which will be seen by the cluster, and if you were running a server, you should receive an un-nominate message in response, which will let you shutdown cleanly.

Disclaimer:

It’s probably worth mentioning that the current implementation has a few issues, and at least one race. The goal is to have it polished up by the time etcd v3 is released, but it’s perfectly usable for testing and experimentation today! If you don’t want to automatically cluster, you can always use the --no-server flag, and point mgmt at a manually managed mgmt cluster using the --seeds flag.

Testing:

Testing this feature on a single machine makes development and experimentation easier, so as a result, there are a few flags which make this possible.

--hostname <hostname>
With this flag, you can force your mgmt client to pretend it is running on a host with the above mentioned name. You can use this to specify --hostname h1, or --hostname h2, and so on; one for each mgmt agent you want to run on the same machine.

--server-urls <ip:port>
With this flag you can specify which IP address and port the etcd server will listen on for peer requests. By default this will use 127.0.0.1:2379, but when running multiple mgmt agents on the same machine you’ll need to specify this manually to avoid collisions. You can specify as many IP address and port pairs as you’d like by separating them with commas or semicolons. The --peer-urls flag is an alias which does the same thing.

--client-urls <ip:port>
This flag specifies which IP address and port the etcd server will listen on for client connections. It defaults to 127.0.0.1:2380, but you’ll occasionally want to specify this manually for the same reasons as mentioned above. You can specify as many IP address and port pairs as you’d like by separating them with commas or semicolons. This is the address that will be used by the --seeds flag when joining an existing cluster.

Elastic clustering:

In the future, you’ll be able to specify a much more elaborate method to decide how many hosts should be promoted into peers, and which hosts should be nominated or un-nominated when growing or shrinking the cluster.

At the moment, we do the grow or shrink operation when the current peer count does not match the requested cluster size. This value has a default of 5, and can even be changed dynamically. To do so you can run:

ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 put /_mgmt/idealClusterSize 3

You can also set it at start-up by using the --ideal-cluster-size flag.

Example:

Here’s a real example if you want to dive in. Try running the following four commands in separate terminals:

./mgmt run --file examples/etcd1a.yaml --hostname h1 --ideal-cluster-size 3
./mgmt run --file examples/etcd1b.yaml --hostname h2 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2381 --server-urls http://127.0.0.1:2382
./mgmt run --file examples/etcd1c.yaml --hostname h3 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2383 --server-urls http://127.0.0.1:2384
./mgmt run --file examples/etcd1d.yaml --hostname h4 --seeds http://127.0.0.1:2379 --client-urls http://127.0.0.1:2385 --server-urls http://127.0.0.1:2386

Once you’ve done this, you should have a three host cluster! Check this by running any of these commands:

ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 member list
ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2381 member list
ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2383 member list

Note that you’ll need a v3 beta version of the etcdctl command which you can get by running ./build in the etcd git repo.

To grow your cluster, try increasing the desired cluster size to five:

ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2381 put /_mgmt/idealClusterSize 5

You should see the last host start-up an etcd server. If you reduce the idealClusterSize, you’ll see servers shutdown! You’re responsible if you destroy the cluster by setting it too low! You can then try growing your cluster again, but unfortunately due to a bug, hosts can’t be re-used yet, and you’ll get a “bind: address already in use” error. We hope to have this fixed shortly!

Security:

Unfortunately no authentication security or transport security has been implemented yet. We have a great design, but are busy working on other parts of the project at the moment. If you’d like to help out here, please let us know!

Future work:

There’s still a lot of work to do, to improve this feature. The biggest challenge has been getting a reasonable embedded server API upstream. It’s not clear whether this patch can be made to work or if something different will need to be written, but at least one other project looks like it could benefit from this as well.

Video:

A recording from the recent Berlin CoreOSFest 2016 has been published! I demoed these recent features, but one interesting note is that I am actually presenting an earlier version of the code which used the etcd V2 API. I’ve since ported the code to V3, but it is functionally similar. It’s probably worth mentioning, that I found the V3 API to be more difficult, but also more correct and powerful. I think it is a net improvement to the project.

Community:

I can’t end this blog post without mentioning some of the great stuff that’s been happening in the mgmt community! In particular, Felix has written some great code to run existing Puppet code on mgmt. Check out his work!

Upcoming speaking:

I’ve got some upcoming speaking in Hong Kong at HKOSCon16 and in Cape Town at DebConf16 about the project. Please ping me if you’ll be in one of these cities and would like to hack on mgmt or just chat about the project. I’m happy to give some impromptu demos if you ask!

Thanks for reading!

Happy Hacking,

James

PS: We now have a community run twitter account. Check us out!

Upcoming speaking In Hong Kong and South Africa

I’m thrilled to tell you that I’ll be speaking about mgmt in Hong Kong and South Africa. It will be my first time to both countries and my first time to Asia and Africa!

In Hong Kong I’ll be speaking at HKOSCon2016.

In South Africa I’ll be speaking at DebConf16.

I’m looking forward to meeting with many of the hard-working Debian hackers, and collaborating with them to build and promote excellent Free Software. The mgmt project considers both Fedora and Debian to be first class platforms, and parity is a primary design goal.

I’ll be presenting and demoing many of the new features in mgmt. If you haven’t heard about the project, please read some of the posts about it…

You’re also welcome to come join our IRC channel! It’s #mgmtconfig on Freenode.

Many special thanks to both conference organizers as well as Red Hat and the OSAS department for sponsoring my travel! Without it, visiting these places and interacting with new continents of hackers would be impossible.

Happy Hacking,

James

PS: We now have a community run twitter account. Check us out!