Ten minute hacks: Process pause & resume

I’m old school and still rocking an old X220 laptop because I didn’t like the new ones. My battery life isn’t as great as I’d like it to be, but it gets worse when some “webapp” (which I’d much rather have as a native GTK+ app) causes Firefox to rev my CPU with their websocket (hi gmail!) poller.

This seems to happen most often on planes or when I’m disconnected from the internet. Since it’s difficult to know which tab is the offending one, and since I might want to keep that tabs state anyway, I decided to write a little shell script to pause and resume misbehaving processes.

After putting it into my ~/bin/ and running chmod u+x ~/bin/pause-continue.sh on it, you can now:

james@computer:~$ pause-continue.sh firefox
Stopping 'firefox'...
[press any key to continue]
Continuing 'firefox'...
james@computer:~$ echo $?        # error codes work iirc
0
james@computer:~$

The code is trivially simple, with an added curses hack to make this 13% more fun. It sends a SIGSTOP signal initially, and then when you press a key it resumes it with SIGCONT. Here it is the code.

You should obviously substitute in the name of the process that you’d like to pause and resume. If your process breaks because it didn’t deal well with the signals, then you get to keep both pieces!

This should help me on my upcoming travel! I’ll be presenting some of my mgmtconfig work at DevConf.cz, FOSDEM and CfgMgmtCamp.eu! CfgMgmtCamp will also have a short mgmt track (looking forward to seeing Felix present!) and we’ll be around to hack on the 8th during fringe (the day after the official camp) if you’d like help to get your patch merged! I’m looking forward to it!

Happy hacking!

James

A revisionist history of configuration management

I’ve got a brand new core feature in mgmt called send/recv which I plan to show you shortly, but first I’d like to start with some background.

History

This is my historical perspective and interpretation about the last twenty years in configuration management. It’s likely inaccurate and slightly revisionist, but it should be correct enough to tell the design story that I want to share.

Sometime after people started to realize that writing bash scripts wasn’t a safe, scalable, or reusable way to automate systems, CFEngine burst onto the scene with the first real solution to this problem. I think it was mostly all quite sane, but it wasn’t a tool which let us build autonomous systems, so people eventually looked elsewhere.

Later on, a new tool called Puppet appeared, and advertised itself as a “CFEngine killer”. It was written in a flashy new language called Ruby, and started attracting a community. I think it had some great ideas, and in particular, the idea of a safe declarative language was a core principle of the design.

I first got into configuration management around this time. My first exposure was to Puppet version 0.24, IIRC. Two major events followed.

  1. Puppet (the company, previously named “Reductive Labs”) needed to run a business (rightly so!) and turned their GPL licensed project, into an ALv2 licensed one. This opened the door to an open-core business model, and I think was ultimately a detriment to the Puppet community.
  2. Some felt that the Puppet DSL (language) was too restrictive, and that this was what prevented them from building autonomous systems. They eventually started a project called Chef which let you write your automation using straight Ruby code. It never did lead them to build autonomous systems.

At this point, as people began to feel that the complexity (in particular around multi-machine environments) starting to get too high, a flashy new orchestrator called Ansible appeared. While I like to put centralized orchestrators in a different category than configuration management, it sits in the same problem space so we’ll include it here.

Ansible tried to solve the complexity and multi-machine issue by determining the plan of action in advance, and then applying those changes remotely over SSH. Instead of a brand new “language”, they ended up with a fancy YAML syntax which has been loved by many and disliked by others. You also couldn’t really exchange host-local information between hosts at runtime, but this was a more advanced requirement anyway. They never did end up building reactive, autonomous systems, but this might not have been a goal.

Sometime later, container technology had a renaissance. The popular variant that caused a stir was called Docker. This dominant form was one in which you used a bash script wrapped in some syntactic sugar (a “Dockerfile”) to build your container images. Many believed (although incorrectly) that container technology would be a replacement for this configuration management scene. The solution was to build these blobs with shell scripts, and to mix-in the mostly useless concept of image layering.

They seem to have taken the renaissance too literally, and when they revived container technology, they also brought back using the shell as a build primitive. While I am certainly a fan and user of bash, and I do appreciate the nostalgia, it isn’t the safe, scalable design that I was looking for.

Docker is definitely in a different category than configuration management, and in fact, I think the two are actually complementary, and even though I prefer systemd-nspawn, we’ll mention Docker here so that I can publicly discredit the notion that it sits in or replaces this problem space.

While in some respects they got much closer to being able to build autonomous systems, you had to rewrite your software to be able to fit into this model, and even then, there are many shortcomings that still haven’t been resolved.

Analysis

On the path to autonomous systems, there is certainly a lot of trial and error. I don’t pretend to think that I have solved this problem, but I think I’ll get pretty close.

  • Where CFEngine chose the C language for portability, it lacked safety, and
  • Where Puppet chose a declarative language for safety, it lacked power, and
  • Where Chef chose raw code for power, it lacked simplicity, and
  • Where Ansible chose an orchestrator for simplicity, it lacked distribution and
  • Where Docker chose multiple instances for distribution, it lacked coordination.

I believe that instead the answer to all of these is still ahead. When discussing power, I think the main mistake was the lack of a sufficiently advanced resource primitive. The event based engine in mgmt is intended to be the main aspect of this solution, but not the whole story. For another piece of this story, I invented something I’m calling send/recv.

Send/Recv

I’d like to go into this today, but I think I’m going to split this discussion into a separate blog post. Expect something here within a week!

If you hate the suspense, become a contributor and be involved in these discussions! We’re hanging out in #mgmtconfig on Freenode. I also hold occasional videoconferences with code contributors where we talk about the future.

Thanks

I learned a tremendous amount from all of these earlier tools and communities, and even though I am working on a next generation tool, I would never be where I am now if it wasn’t for all of those who came before me. I’m even trying to borrow ideas where it is appropriate to do so! I welcome all of those communities into the mgmt circle, and I hope that their users all continue to positively influence the design of mgmt.

Happy hacking,

James

One hour hacks: Remote LUKS over SSH

I have a GNU/Linux server which I mount a few LUKS encrypted drives on. I only ever interact with the server over SSH, and I never want to keep the LUKS credentials on the remote server. I don’t have anything especially sensitive on the drives, but I think it’s a good security practice to encrypt it all, if only to add noise into the system and for solidarity with those who harbour much more sensitive data.

This means that every time the server reboots or whenever I want to mount the drives, I have to log in and go through the series of luksOpen and mount commands before I can access the data. This turned out to be a bit laborious, so I wrote a quick script to automate it! I also made sure that it was idempotent.

I decided to share it because I couldn’t find anything similar, and I was annoyed that I had to write this in the first place. Hopefully it saves you some anguish. It also contains a clever little bash hack that I am proud to have in my script.

Here’s the script. You’ll need to fill in the map of mount folder names to drive UUID’s, and you’ll want to set your server hostname and FQDN to match your environment of course. It will prompt you for your root password to mount, and the LUKS password when needed.

Example of mounting:

james@computer:~$ rluks.sh 
Running on: myserver...
[sudo] password for james: 
Mount/Unmount [m/u] ? m
Mounting...
music: mkdir ✓
LUKS Password: 
music: luksOpen ✓
music: mount ✓
files: mkdir ✓
files: luksOpen ✓
files: mount ✓
photos: mkdir ✓
photos: luksOpen ✓
photos: mount ✓
Done!
Connection to server.example.com closed.

Example of unmounting:

james@computer:~$ rluks.sh 
Running on: myserver...
[sudo] password for james: 
Sorry, try again.
[sudo] password for james: 
Mount/Unmount [m/u] ? u
Unmounting...
music: umount ✓
music: luksClose ✓
music: rmdir ✓
files: umount ✓
files: luksClose ✓
files: rmdir ✓
photos: umount ✓
photos: luksClose ✓
photos: rmdir ✓
Done!
Connection to server.example.com closed.
james@computer:~$

It’s worth mentioning that there are many improvements that could be made to this script. If you’ve got patches, send them my way. After all, this is only a: one hour hack.

Happy hacking,

James

PS: One day this sort of thing might be possible in mgmt. Let me know if you want to help work on it!

Introducing: git tpush

On today’s issue of “one hour hacks”, I’ll show you how you can stop your git drive-by’s to git master from breaking your CI tests… Let’s continue!

The problem:

Sometimes I’ve got a shitty one-line patch that I want to push to git master. I’m usually right, and everything tests out fine, but usually isn’t always, and then I look silly while I frantically try to fix git master on a project that I maintain. Let’s use tools to hide our human flaws!

How you’re really supposed to do it:

Good hackers know that you’re supposed to:

  • checkout a new feature branch for your patch
    • checkout -b feat/foo
  • write the patch and commit it
    • git add -p && git commit -m 'my foo'
  • push that branch up to origin
    • git push origin feat/foo
  • wait for the tests to succeed
    • hub ci-status && sleep 1m && ...
  • merge and push to git master
    • git checkout master && git merge feat/foo && git push
  • delete your local branch
    • git branch -d feat/foo
  • delete the remote branch
    • git push origin :feat/foo
  • mourn all the lost time it took to push this patch
    • cat && drink && sleep 8h

If it happens to fail, you have to remember what the safe git reset command is, and then you have to start all over!

How do we make this easy?

I write tools, so I sat down and wrote a cheap little bash script to do it for me! Sharing is caring, so please enjoy a copy!

How does it work?

It actually just does all of the above for you automatically, and it automatically cleans up all of the temporary branches when it’s done! Here are two example runs to help show you the tool in action:

A failure scenario:

Here I add a patch with some trailing whitespace, which will easily get caught by the automatic test suite. You’ll note that I actually had to force the commit locally with git commit -n, because the pre-commit hook actually caught this first. You could extend this script to run your test suite locally before you push the patch, but that defeats the idea of a drive-by.

james@computer:~/code/mgmt$ git add -p
diff --git a/README.md b/README.md
index 277aa73..2c31356 100644
--- a/README.md
+++ b/README.md
@@ -34,6 +34,7 @@ Please get involved by working on one of these items or by suggesting something
 Please grab one of the straightforward [#mgmtlove](https://github.com/purpleidea/mgmt/labels/mgmtlove) issues if you're a first time contributor.
 
 ## Bugs:
+Some bugs are caused by bad drive-by patches to git master! 
 Please set the `DEBUG` constant in [main.go](https://github.com/purpleidea/mgmt/blob/master/main.go) to `true`, and post the logs when you report the [issue](https://github.com/purpleidea/mgmt/issues).
 Bonus points if you provide a [shell](https://github.com/purpleidea/mgmt/tree/master/test/shell) or [OMV](https://github.com/purpleidea/mgmt/tree/master/test/omv) reproducible test case.
 
Stage this hunk [y,n,q,a,d,/,e,?]? y
<stdin>:9: trailing whitespace.
Some bugs are caused by bad drive-by patches to git master! 
warning: 1 line adds whitespace errors.

james@computer:~/code/mgmt$ git commit -m 'XXX this is a bad patch'
README.md:37: trailing whitespace.
+Some bugs are caused by bad drive-by patches to git master! 
james@computer:~/code/mgmt$ git commit -nm 'XXX this is a bad patch'
[master 8b29fb3] XXX this is a bad patch
 1 file changed, 1 insertion(+)
james@computer:~/code/mgmt$ git tpush 
tpush branch is: tpush/test-0
Switched to a new branch 'tpush/test-0'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 357 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
 * [new branch]      tpush/test-0 -> tpush/test-0
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
- CI is starting...
/ Waiting for CI...
- Waiting for CI...
\ Waiting for CI...
| Waiting for CI...
Deleted branch tpush/test-0 (was 8b29fb3).
To git@github.com:purpleidea/mgmt.git
 - [deleted]         tpush/test-0
Upstream CI failed with:
failure: https://travis-ci.org/purpleidea/mgmt/builds/109472293
Fix branch 'master' with:
git commit --amend
or:
git reset --soft HEAD^
Happy hacking!

A successful scenario:

In this example, I amended the previous patch in my local copy of git master (which never got pushed to the public git master!) and then I ran git tpush again.

james@computer:~/code/mgmt$ git commit --amend
[master 1186d63] Add a link to my article about debugging golang
 Date: Mon Feb 15 17:28:01 2016 -0500
 1 file changed, 1 insertion(+)
james@computer:~/code/mgmt$ git tpush 
tpush branch is: tpush/test-0
Switched to a new branch 'tpush/test-0'
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 397 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
 * [new branch]      tpush/test-0 -> tpush/test-0
Switched to branch 'master'
Your branch is ahead of 'origin/master' by 1 commit.
  (use "git push" to publish your local commits)
- CI is starting...
/ Waiting for CI...
- Waiting for CI...
\ Waiting for CI...
| Waiting for CI...
Deleted branch tpush/test-0 (was 1186d63).
To git@github.com:purpleidea/mgmt.git
 - [deleted]         tpush/test-0
Counting objects: 3, done.
Delta compression using up to 4 threads.
Compressing objects: 100% (3/3), done.
Writing objects: 100% (3/3), 397 bytes | 0 bytes/s, done.
Total 3 (delta 2), reused 0 (delta 0)
To git@github.com:purpleidea/mgmt.git
   6e68d6d..1186d63  master -> master
Done!

Please take note of the “Waiting for CI…” text. This is actually a spinner which updates itself. Run this tool yourself to get the full effect! This avoids having 200 line scrollback in your terminal while you wait for your CI to complete.

Lastly, you’ll note that I run git tpush instead of git-tpush. This is because I added an alias to my ~/.gitconfig. It’s quite easy, just add:

tpush = !git-tpush

under the [alias] section, and ensure the git-tpush shell script is in your ~/bin/ folder and is executable.

Pre-requisites:

This actually depends on the hub command line tool which knows how to ask our CI about the test status. It sucks using a distributed tool like git with a centralized thing like github, but it’s still pretty easy for me to move away if they go bad. You’ll also want the tput installed if you want the fancy spinner, but that comes with most GNU/Linux distros by default.

The code:

Grab a copy here!

Happy hacking,

James

PS: No, I couldn’t think of a better name than git-tpush, if you don’t like it feel free to rename it yourself!

Next generation configuration mgmt

It’s no secret to the readers of this blog that I’ve been active in the configuration management space for some time. I owe most of my knowledge to what I’ve learned while working with Puppet and from other hackers working in and around various other communities.

I’ve published, a number, of articles, in an, attempt, to push, the field, forwards, and to, share the, knowledge, that I’ve, learned, with others. I’ve spent many nights thinking about these problems, but it is not without some chagrin that I realized that the current state-of-the-art in configuration management cannot easily (or elegantly) solve all the problems for which I wish to write solutions.

To that end, I’d like to formally present my idea (and code) for a next generation configuration management prototype. I’m calling my tool mgmt.

Design triad

Mgmt has three unique design elements which differentiate it from other tools. I’ll try to cover these three points, and explain why they’re important. The summary:

  1. Parallel execution, to run all the resources concurrently (where possible)
  2. Event driven, to monitor and react dynamically only to changes (when needed)
  3. Distributed topology, so that scale and centralization problems are replaced with a robust distributed system

The code is available, but you may prefer to read on as I dive deeper into each of these elements first.

1) Parallel execution

Fundamentally, all configuration management systems represent the dependency relationships between their resources in a graph, typically one that is directed and acyclic.

directed acyclic graph g1, showing the dependency relationships with black arrows, and the linearized dependency sort order with red arrows.

Directed acyclic graph g1, showing the dependency relationships with black arrows, and the linearized dependency sort order (a topological sort) with red arrows.

Unfortunately, the execution of this graph typically has a single worker that runs through a linearized, topologically sorted version of it. There is no reason that a graph with a number of disconnected parts cannot run each separate section in parallel with each other.

g2

Graph g2 with the red arrows again showing the execution order of the graph. Please note that this graph is composed of two disconnected parts: one diamond on the left and one triplet on the right, both of which can run in parallel. Additionally, nodes 2a and 2b can run in parallel only after 1a has run, and node 3a requires the entire left diamond to have succeeded before it can execute.

Typically, some nodes will have a common dependency, which once met will allow its children to all execute simultaneously.

This is the first major design improvement that the mgmt tool implements. It has obvious improvements for performance, in that a long running process in one part of the graph (eg: a package installation) will cause no delay on a separate disconnected part of the graph which is in the process of converging unrelated code. It also has other benefits which we will discuss below.

In practice this is particularly powerful since most servers under configuration management typically combine different modules, many of which have no inter-dependencies.

An example is the best way to show off this feature. Here we have a set of four long (10 second) running processes or exec resources. Three of them form a linear dependency chain, while the fourth has no dependencies or prerequisites. I’ve asked the system to exit after it has been converged for five seconds. As you can see in the example, it is finished five seconds after the limiting resource should be completed, which is the longest running delay to complete in the whole process. That limiting chain took 30 seconds, which we can see in the log as being from three 10 second executions. The logical total of about 35 seconds as expected is shown at the end:

$ time ./mgmt run --file graph8.yaml --converged-timeout=5 --graphviz=example1.dot
22:55:04 This is: mgmt, version: 0.0.1-29-gebc1c60
22:55:04 Main: Start: 1452398104100455639
22:55:04 Main: Running...
22:55:04 Graph: Vertices(4), Edges(2)
22:55:04 Graphviz: Successfully generated graph!
22:55:04 State: graphStarting
22:55:04 State: graphStarted
22:55:04 Exec[exec4]: Apply //exec4 start
22:55:04 Exec[exec1]: Apply //exec1 start
22:55:14 Exec[exec4]: Command output is empty! //exec4 end
22:55:14 Exec[exec1]: Command output is empty! //exec1 end
22:55:14 Exec[exec2]: Apply //exec2 start
22:55:24 Exec[exec2]: Command output is empty! //exec2 end
22:55:24 Exec[exec3]: Apply //exec3 start
22:55:34 Exec[exec3]: Command output is empty! //exec3 end
22:55:39 Converged for 5 seconds, exiting! //converged for 5s
22:55:39 Interrupted by exit signal
22:55:39 Exec[exec4]: Exited
22:55:39 Exec[exec1]: Exited
22:55:39 Exec[exec2]: Exited
22:55:39 Exec[exec3]: Exited
22:55:39 Goodbye!

real    0m35.009s
user    0m0.008s
sys     0m0.008s
$

Note that I’ve edited the example slightly to remove some unnecessary log entries for readability sake, and I have also added some comments and emphasis, but aside from that, this is actual output! The tool also generated graphviz output which may help you better understand the problem:

 

example1.dot

This example is obviously contrived, but is designed to illustrate the capability of the mgmt tool.

Hopefully you’ll be able to come up with more practical examples.

2) Event driven

All configuration management systems have some notion of idempotence. Put simply, an idempotent operation is one which can be applied multiple times without causing the result to diverge from the desired state. In practice, each individual resource will typically check the state of the element, and if different than what was requested, it will then apply the necessary transformation so that the element is brought to the desired state.

The current generation of configuration management tools, typically checks the state of each element once every 30 minutes. Some do it more or less often, and some do it only when manually requested. In all cases, this can be an expensive operation due to the size of the graph, and the cost of each check operation. This problem is compounded by the fact that the graph doesn’t run in parallel.

g3

In this time / state sequence diagram g3, time progresses from left to right. Each of the three elements (from top to bottom) want to converge on states a, b and c respectively. Initially the first two are in states x and y, where as the third is already converged. At t1 the system runs and converges the graph, which entails a state change of the first and second elements. At some time t2, the elements are changed by some external force, and the system is no longer converged. We won’t notice till later! At this time t3 when we run for the second time, we notice that the second and third elements are no longer converged and we apply the necessary operations to fix this. An unknown amount of time passed where our cluster was in a diverged or unhealthy state. Traditionally this is on the order of 30 minutes.

More importantly, if something diverges from the requested state you might wait 30 minutes before it is noticed and repaired by the system!

The mgmt system is unique, because I realized that an event based system could fulfill the same desired behaviour, and in fact it offers a more general and powerful solution. This is the second major design improvement that the mgmt tool implements.

These events that we’re talking about are inotify events for file changes, systemd events (from dbus) for service changes, packagekit events (from dbus again) for package change events, and events from exec calls, timers, network operations and more! In the inotify example, on first run of the mgmt system, an inotify watch is taken on the file we want to manage, the state is checked and it is converged if need be. We don’t ever need to check the state again unless inotify tells us that something happened!

g4

In this time / state sequence diagram g4, time progresses from left to right. After the initial run, since all the elements are being continuously monitored, the instant something changes, mgmt reacts and fixes it almost instantly.

Astute config mgmt hackers might end up realizing three interesting consequences:

  1. If we don’t want the mgmt program to continuously monitor events, it can be told to exit after the graph converges, and run again 30 minutes later. This can be done with my system by running it from cron with the --converged-timeout=1, flag. This effectively offers the same behaviour that current generation systems do for the administrators who do not want to experiment with a newer model. Thus, the current systems are a special, simplified model of mgmt!
  2. It is possible that some resources don’t offer an event watching mechanism. In those instances, a fallback to polling is possible for that specific resource. Although there currently aren’t any missing event APIs that your narrator knows about at this time.
  3. A monitoring system (read: nagios and friends) could probably be built with this architecture, thus demonstrating that my world view of configuration management is actually a generalized version of system monitoring! They’re the same discipline!

Here is a small real-world example to demonstrate this feature. I have started the agent, and I have told it to create three files (f1, f2, f3) with the contents seen below, and also, to ensure that file f4 is not present. As you can see the mgmt system responds quite quickly:

james@computer:/tmp/mgmt$ ls
f1  f2  f3
james@computer:/tmp/mgmt$ cat *
i am f1
i am f2
i am f3
james@computer:/tmp/mgmt$ rm -f f2 && cat f2
i am f2
james@computer:/tmp/mgmt$ echo blah blah > f2 && cat f2
i am f2
james@computer:/tmp/mgmt$ touch f4 && file f4
f4: cannot open `f4' (No such file or directory)
james@computer:/tmp/mgmt$ ls
f1  f2  f3
james@computer:/tmp/mgmt$

That’s fast!

3) Distributed topology

All software typically runs with some sort of topology. Puppet and Chef normally run in a client server topology, where you typically have one server with many clients, each running an agent. They also both offer a standalone mode, but in general this is not more interesting than running a fancy bash script. In this context, I define interesting as “relating to clustered, multiple machine architectures”.

g5

Here in graph g5 you can see one server which has three clients initiating requests to it.

This traditional model of computing is well-known, and fairly easy to reason about. You typically put all of your code in one place (on the server) and the clients or agents need very little personalized configuration to get working. However, it can suffer from performance and scalability issues, and it can also be a single point of failure for your infrastructure. Make no mistake: if you manage your infrastructure properly, then when your configuration management infrastructure is down, you will be unable to bring up new machines or modify existing ones! This can be a disastrous type of failure, and is one which is seldom planned for in disaster recovery scenarios!

Other systems such as Ansible are actually orchestrators, and are not technically configuration management in my opinion. That doesn’t mean they don’t share much of the same problem space, and in fact they are usually idempotent and share many of the same properties of traditional configuration management systems. They are useful and important tools!

graph6The key difference about an orchestrator, is that it typically operates with a push model, where the server (or the sysadmin laptop) initiates a connection to the machines that it wants to manage. One advantage is that this is sometimes very easy to reason about for multi machine architectures, however it shares the usual downsides around having a single point of failure. Additionally there are some very real performance considerations when running large clusters of machines. In practice these clusters are typically segmented or divided in some logical manner so as to lessen the impact of this, which unfortunately detracts from the aforementioned simplicity of the solution.

Unfortunately with either of these two topologies, we can’t immediately detect when an issue has occurred and respond immediately without sufficiently advanced third party monitoring. By itself, a machine that is being managed by orchestration, cannot detect an issue and communicate back to its operator, or tell the cluster of servers it peers with to react accordingly.

The good news about current and future generation topologies is that algorithms such as the Paxos family and Raft are now gaining wider acceptance and good implementations now exist as Free Software. Mgmt depends on these algorithms to create a mesh of agents. There are no clients and servers, only peers! Each peer can choose to both export and collect data from a distributed data store which lives as part of the cluster of peers. The software that currently implements this data store is a marvellous piece of engineering called etcd.

graph7

In graph g7, you can see what a fully interconnected graph topology might look like. It should be clear that the numbed of connections (or edges) is quite large. Try and work out the number of edges required for a fully connected graph with 128 nodes. Hint, it’s large!

In practice the number of connections required for each peer to connect to each other peer would be too great, so instead the cluster first achieves distributed consensus, and then the elected leader picks a certain number of machines to run etcd masters. All other agents then connect through one of these masters. The distributed data store can easily handle failures, and agents can reconnect seamlessly to a different temporary master should they need to. If there is a lack or an abundance of transient masters, then the cluster promotes or demotes an agent automatically by asking it to start or stop an etcd process on its host.

g8

In graph g8, you can see a tightly interconnected centre of nodes running both their configuration management tasks, but also etcd masters. Each additional peer picks any of them to connect to. As the number of nodes scale, it is far easier to scale such a cluster. Future algorithm designs and optimizations should help this system scale to unprecedented host counts. It should go without saying that it would be wise to ensure that the nodes running etcd masters are in different failure domains.

By allowing hosts to export and collect data from the distributed store, we actually end up with a mechanism that is quite similar to what Puppet calls exported resources. In my opinion, the mechanism and data interchange is actually a brilliant idea, but with some obvious shortcomings in its implementation. This is because for a cluster of N nodes, each wishing to exchange data with one another, puppet must run N times (once on each node) and then N-1 times for the entire cluster to see all of the exchanged data. Each of these runs requires an entire sequential run through every resource, and an expensive check of each resource, each time.

In contrast, with mgmt, the graph is redrawn only when an etcd event notifies us of a change in the data store, and when the new graph is applied, only members who are affected either by a change in definition or dependency need to be re-run. In practice this means that a small cluster where the resources themselves have a negligible apply time, can converge a complete connected exchange of data in less than one second.

An example demonstrates this best.

  • I have three nodes in the system: A, B, C.
  • Each creates four files, two of which it will export.
  • On host A, the two files are: /tmp/mgmtA/f1a and /tmp/mgmtA/f2a.
  • On host A, it exports: /tmp/mgmtA/f3a and /tmp/mgmtA/f4a.
  • On host A, it collects all available (exported) files into: /tmp/mgmtA/
  • It is done similarly with B and C, except with the letters B and C substituted in to the emphasized locations above.
  • For demonstration purposes, I startup the mgmt engine first on A, then B, and finally C, all the while running various terminal commands to keep you up-to-date.

As before, I’ve trimmed the logs and annotated the output for clarity:

james@computer:/tmp$ rm -rf /tmp/mgmt* # clean out everything
james@computer:/tmp$ mkdir /tmp/mgmt{A..C} # make the example dirs
james@computer:/tmp$ tree /tmp/mgmt* # they're indeed empty
/tmp/mgmtA
/tmp/mgmtB
/tmp/mgmtC

0 directories, 0 files
james@computer:/tmp$ # run node A, it converges almost instantly
james@computer:/tmp$ tree /tmp/mgmt*
/tmp/mgmtA
├── f1a
├── f2a
├── f3a
└── f4a
/tmp/mgmtB
/tmp/mgmtC

0 directories, 4 files
james@computer:/tmp$ # run node B, it converges almost instantly
james@computer:/tmp$ tree /tmp/mgmt*
/tmp/mgmtA
├── f1a
├── f2a
├── f3a
├── f3b
├── f4a
└── f4b
/tmp/mgmtB
├── f1b
├── f2b
├── f3a
├── f3b
├── f4a
└── f4b
/tmp/mgmtC

0 directories, 12 files
james@computer:/tmp$ # run node C, exit 5 sec after converged, output:
james@computer:/tmp$ time ./mgmt run --file examples/graph3c.yaml --hostname c --converged-timeout=5
01:52:33 main.go:65: This is: mgmt, version: 0.0.1-29-gebc1c60
01:52:33 main.go:66: Main: Start: 1452408753004161269
01:52:33 main.go:203: Main: Running...
01:52:33 main.go:103: Etcd: Starting...
01:52:33 config.go:175: Collect: file; Pattern: /tmp/mgmtC/
01:52:33 main.go:148: Graph: Vertices(8), Edges(0)
01:52:38 main.go:192: Converged for 5 seconds, exiting!
01:52:38 main.go:56: Interrupted by exit signal
01:52:38 main.go:219: Goodbye!

real    0m5.084s
user    0m0.034s
sys    0m0.031s
james@computer:/tmp$ tree /tmp/mgmt*
/tmp/mgmtA
├── f1a
├── f2a
├── f3a
├── f3b
├── f3c
├── f4a
├── f4b
└── f4c
/tmp/mgmtB
├── f1b
├── f2b
├── f3a
├── f3b
├── f3c
├── f4a
├── f4b
└── f4c
/tmp/mgmtC
├── f1c
├── f2c
├── f3a
├── f3b
├── f3c
├── f4a
├── f4b
└── f4c

0 directories, 24 files
james@computer:/tmp$

Amazingly, the cluster converges in less than one second. Admittedly it didn’t have large amounts of IO to do, but since those are fixed constants, it still shows how fast this approach should be. Feel free to do your own tests to verify.

Code

The code is publicly available and has been for some time. I wanted to release it early, but I didn’t want to blog about it until I felt I had the initial design triad completed. It is written entirely in golang, which I felt was a good match for the design requirements that I had. It is my first major public golang project, so I’m certain there are many things I could be doing better. As a result, I welcome your criticism and patches, just please try and keep them constructive and respectful! The project is entirely Free Software, and I plan to keep it that way. As long as Red Hat is involved, I’m pretty sure you won’t have to deal with any open core compromises!

Community

There’s an IRC channel going. It’s #mgmtconfig on Freenode. Please come hangout! If we get bigger, we’ll add a mailing list.

Caveats

There are a few caveats that I’d like to mention. Please try to keep these in mind.

  • This is still an early prototype, and as a result isn’t ready for production use, or as a replacement for existing config management software. If you like the design, please contribute so that together we can turn this into a mature system faster.
  • There are some portions of the code which are notably absent. In particular, there is no lexer or parser, nor is there a design for what the graph building language (DSL) would look like. This is because I am not a specialist in these areas, and as a result, while I have some ideas for the design, I don’t have any useful code yet. For testing the engine, there is a (quickly designed) YAML graph definition parser available in the code today.
  • The etcd master election/promotion portion of the code is not yet available. Please stay tuned!
  • There is no design document, roadmap or useful documentation currently available. I’ll be working to remedy this, but I first wanted to announce the project, gauge interest and get some intial feedback. Hopefully others can contribute to the docs, and I’ll try to write about my future design ideas as soon as possible.
  • The name mgmt was the best that I could come up with. If you can propose a better alternative, I’m open to the possibility.
  • I work for Red Hat, and at first it might seem confusing to announce this work alongside our existing well-known Puppet and Ansible teams. To clarify, this is a prototype of some work and designs that I started before I was employed at Red Hat. I’m grateful that they’ve been letting me work on interesting projects, and I’m very pleased that my managers have had the vision to invest in future technologies and projects that (I hope) might one day become the de facto standard.

Presentations

It is with great honour, that my first public talk about this project will be at Config Management Camp 2016. I am thrilled to be speaking at such an excellent conference, where I am sure the subject matter will be a great fit for all the respected domain experts who will be present. Find me on the schedule, and please come to my session.

I’m also fortunate enough to be speaking about the same topic, just four days later in Brno, at DevConf.CZ. It’s a free conference, in an excellent city, and you’ll be sure to find many excellent technical sessions and hackers!

I hope to see you at one of these events or at a future conference. If you’d like to have me speak at your conference or event, please contact me!

Conclusion

Thank you for reading this far! I hope you appreciate my work, and I promise to tell you more about some of the novel designs and properties that I have planned for the future of mgmt. Please leave me your comments, even if they’re just +1’s.

Happy hacking!

James

 

Post scriptum

There seems to be a new trend about calling certain types of management systems or designs “choreography”. Since this term is sufficiently overloaded and without a clear definition, I choose to avoid it, however it’s worth mentioning that some of the ideas from some of the definitions of this word as pertaining to the configuration management field match what I’m trying to do with this design. Instead of using the term “choreography”, I prefer to refer to what I’m doing as “configuration management”.

Some early peer reviews suggested this might be a “puppet-killer”. In fact, I actually see it as an opportunity to engage with the puppet community and to share my design and engine, which I hope some will see as a more novel solution. Existing puppet code could be fed through a cross compiler to output a graph that actually runs on my engine. While I plan to offer an easier to use and more powerful DSL language, the 80% of existing puppet code isn’t more than plumbing, package installation, and simple templating, so a gradual migration would be possible, where the multi-system configuration management parts are implemented using my new patterns instead of with slowly converging puppet. The same things could probably also be done with other languages like Chef. Anyone from Puppet Labs, Chef Software Inc., or the greater hacker community is welcome to contact me personally if they’d like to work on this.

Lastly, but most importantly, thanks to everyone who has discussed these ideas with me, reviewed this article, and contributed in so many ways. You’re awesome!

Trying out Ceph with Oh-My-Vagrant

Daniel P. Berrangé wrote about trying out a single node ceph cluster. I decided to take his article and turn it into an Oh-My-Vagrant omv.yaml file. It took me about two minutes to do so, and two hours to debug a problem caused by something I had broken on my laptop.

If you’d like to replicate his article in less than 5 minutes, pull down the omv.yaml file that I’ve just published and run omv up. Here’s the full terminal output of my session:

james@computer:~/code/oh-my-vagrant/examples$ git pull
Already up-to-date.
james@computer:~/code/oh-my-vagrant/examples$ cdtmpmkdir 
james@computer:/tmp/tmp.DhD$ cp $OLDPWD/ceph-deploy.yaml omv.yaml
james@computer:/tmp/tmp.DhD$ time omv up
Bringing machine 'omv1' up with 'libvirt' provider...
==> omv1: Creating image (snapshot of base box volume).
==> omv1: Creating domain with the following settings...
==> omv1:  -- Name:              omv_omv1
==> omv1:  -- Domain type:       kvm
==> omv1:  -- Cpus:              1
==> omv1:  -- Memory:            512M
==> omv1:  -- Base box:          fedora-23
==> omv1:  -- Storage pool:      default
==> omv1:  -- Image:             /var/lib/libvirt/images/omv_omv1.img
==> omv1:  -- Volume Cache:      default
==> omv1:  -- Kernel:            
==> omv1:  -- Initrd:            
==> omv1:  -- Graphics Type:     spice
==> omv1:  -- Graphics Port:     5900
==> omv1:  -- Graphics IP:       127.0.0.1
==> omv1:  -- Graphics Password: Not defined
==> omv1:  -- Video Type:        qxl
==> omv1:  -- Video VRAM:        9216
==> omv1:  -- Keymap:            en-us
==> omv1:  -- Command line : 
==> omv1: Creating shared folders metadata...
==> omv1: Starting domain.
==> omv1: Waiting for domain to get an IP address...
==> omv1: Waiting for SSH to become available...
==> omv1: Setting hostname...
==> omv1: Configuring and enabling network interfaces...
==> omv1: Rsyncing folder: /tmp/tmp.DhD/ => /vagrant
==> omv1: Updating /etc/hosts file on active guest machines...
==> omv1: Running provisioner: shell...
    omv1: Running: inline script
==> omv1: Changing password for user root.
==> omv1: passwd: all authentication tokens updated successfully.
==> omv1: Running provisioner: shell...
    omv1: Running: inline script
==> omv1: TARGET SOURCE    FSTYPE OPTIONS
==> omv1: /      /dev/vda3 xfs    rw,relatime,seclabel,attr2,inode64,noquota
==> omv1: TARGET                                SOURCE     FSTYPE     OPTIONS
==> omv1: /                                     /dev/vda3  xfs        rw,relatime,seclabel,attr2,inode64,noquota
==> omv1: ├─/sys                                sysfs      sysfs      rw,nosuid,nodev,noexec,relatime,seclabel
==> omv1: │ ├─/sys/kernel/security              securityfs securityfs rw,nosuid,nodev,noexec,relatime
==> omv1: │ ├─/sys/fs/cgroup                    tmpfs      tmpfs      ro,nosuid,nodev,noexec,seclabel,mode=755
==> omv1: │ │ ├─/sys/fs/cgroup/systemd          cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd
==> omv1: │ │ ├─/sys/fs/cgroup/devices          cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,devices
==> omv1: │ │ ├─/sys/fs/cgroup/cpu,cpuacct      cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,cpu,cpuacct
==> omv1: │ │ ├─/sys/fs/cgroup/memory           cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,memory
==> omv1: │ │ ├─/sys/fs/cgroup/perf_event       cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,perf_event
==> omv1: │ │ ├─/sys/fs/cgroup/blkio            cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,blkio
==> omv1: │ │ ├─/sys/fs/cgroup/hugetlb          cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,hugetlb
==> omv1: │ │ ├─/sys/fs/cgroup/freezer          cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,freezer
==> omv1: │ │ ├─/sys/fs/cgroup/net_cls,net_prio cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,net_cls,net_prio
==> omv1: │ │ └─/sys/fs/cgroup/cpuset           cgroup     cgroup     rw,nosuid,nodev,noexec,relatime,cpuset
==> omv1: │ ├─/sys/fs/pstore                    pstore     pstore     rw,nosuid,nodev,noexec,relatime,seclabel
==> omv1: │ ├─/sys/fs/selinux                   selinuxfs  selinuxfs  rw,relatime
==> omv1: │ ├─/sys/kernel/debug                 debugfs    debugfs    rw,relatime,seclabel
==> omv1: │ └─/sys/kernel/config                configfs   configfs   rw,relatime
==> omv1: ├─/proc                               proc       proc       rw,nosuid,nodev,noexec,relatime
==> omv1: │ ├─/proc/sys/fs/binfmt_misc          systemd-1  autofs     rw,relatime,fd=27,pgrp=1,timeout=0,minproto=5,maxproto=5,direct
==> omv1: │ └─/proc/fs/nfsd                     nfsd       nfsd       rw,relatime
==> omv1: ├─/dev                                devtmpfs   devtmpfs   rw,nosuid,seclabel,size=242328k,nr_inodes=60582,mode=755
==> omv1: │ ├─/dev/shm                          tmpfs      tmpfs      rw,nosuid,nodev,seclabel
==> omv1: │ ├─/dev/pts                          devpts     devpts     rw,nosuid,noexec,relatime,seclabel,gid=5,mode=620,ptmxmode=000
==> omv1: │ ├─/dev/hugepages                    hugetlbfs  hugetlbfs  rw,relatime,seclabel
==> omv1: │ └─/dev/mqueue                       mqueue     mqueue     rw,relatime,seclabel
==> omv1: ├─/run                                tmpfs      tmpfs      rw,nosuid,nodev,seclabel,mode=755
==> omv1: │ └─/run/user/1000                    tmpfs      tmpfs      rw,nosuid,nodev,relatime,seclabel,size=50112k,mode=700,uid=1000,gid=1000
==> omv1: ├─/tmp                                tmpfs      tmpfs      rw,seclabel
==> omv1: ├─/boot                               /dev/vda1  ext4       rw,relatime,seclabel,data=ordered
==> omv1: └─/var/lib/nfs/rpc_pipefs             sunrpc     rpc_pipefs rw,relatime
==> omv1: meta-data=/dev/vda3              isize=512    agcount=4, agsize=321792 blks
==> omv1:          =                       sectsz=512   attr=2, projid32bit=1
==> omv1:          =                       crc=1        finobt=1
==> omv1: data     =                       bsize=4096   blocks=1287168, imaxpct=25
==> omv1:          =                       sunit=0      swidth=0 blks
==> omv1: naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
==> omv1: log      =internal               bsize=4096   blocks=2560, version=2
==> omv1:          =                       sectsz=512   sunit=0 blks, lazy-count=1
==> omv1: realtime =none                   extsz=4096   blocks=0, rtextents=0
==> omv1: data blocks changed from 1287168 to 10199744
==> omv1: Running provisioner: shell...
    omv1: Running: inline script
==> omv1: this is ceph-deploy on fedora-23
==> omv1: Last metadata expiration check performed 0:00:12 ago on Mon Dec 28 14:34:32 2015.
==> omv1: Dependencies resolved.
==> omv1: ================================================================================
==> omv1:  Package               Arch          Version                Repository     Size
==> omv1: ================================================================================
==> omv1: Installing:
==> omv1:  ceph-deploy           noarch        1.5.25-2.fc23          fedora        160 k
==> omv1:  python-execnet        noarch        1.3.0-3.fc23           fedora        275 k
==> omv1:  python-remoto         noarch        0.0.25-2.fc23          fedora         26 k
==> omv1: 
==> omv1: Transaction Summary
==> omv1: ================================================================================
==> omv1: Install  3 Packages
==> omv1: Total download size: 461 k
==> omv1: Installed size: 1.5 M
==> omv1: Downloading Packages:
==> omv1: --------------------------------------------------------------------------------
==> omv1: Total                                           214 kB/s | 461 kB     00:02     
==> omv1: Running transaction check
==> omv1: Transaction check succeeded.
==> omv1: Running transaction test
==> omv1: Transaction test succeeded.
==> omv1: Running transaction
==> omv1:   Installing  : python-execnet-1.3.0-3.fc23.noarch                          1/3
==> omv1:  
==> omv1:   Installing  : python-remoto-0.0.25-2.fc23.noarch                          2/3
==> omv1:  
==> omv1:   Installing  : ceph-deploy-1.5.25-2.fc23.noarch                            3/3
==> omv1:  
==> omv1:   Verifying   : ceph-deploy-1.5.25-2.fc23.noarch                            1/3
==> omv1:  
==> omv1:   Verifying   : python-remoto-0.0.25-2.fc23.noarch                          2/3
==> omv1:  
==> omv1:   Verifying   : python-execnet-1.3.0-3.fc23.noarch                          3/3
==> omv1:  
==> omv1: 
==> omv1: Installed:
==> omv1:   ceph-deploy.noarch 1.5.25-2.fc23       python-execnet.noarch 1.3.0-3.fc23    
==> omv1:   python-remoto.noarch 0.0.25-2.fc23    
==> omv1: Complete!
==> omv1: [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
==> omv1: [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy new omv1.example.com
==> omv1: [ceph_deploy.new][DEBUG ] Creating new cluster named ceph
==> omv1: [ceph_deploy.new][INFO  ] making sure passwordless SSH succeeds
==> omv1: [omv1.example.com][DEBUG ] connected to host: omv1.example.com 
==> omv1: [omv1.example.com][DEBUG ] detect platform information from remote host
==> omv1: [omv1.example.com][DEBUG ] detect machine type
==> omv1: [omv1.example.com][DEBUG ] find the location of an executable
==> omv1: [omv1.example.com][INFO  ] Running command: /usr/sbin/ip link show
==> omv1: [omv1.example.com][INFO  ] Running command: /usr/sbin/ip addr show
==> omv1: [omv1.example.com][DEBUG ] IP addresses found: ['192.168.123.100', '172.17.0.1', '192.168.121.48']
==> omv1: [ceph_deploy.new][DEBUG ] Resolving host omv1.example.com
==> omv1: [ceph_deploy.new][DEBUG ] Monitor omv1 at 192.168.123.100
==> omv1: [ceph_deploy.new][DEBUG ] Monitor initial members are ['omv1']
==> omv1: [ceph_deploy.new][DEBUG ] Monitor addrs are ['192.168.123.100']
==> omv1: [ceph_deploy.new][DEBUG ] Creating a random mon key...
==> omv1: [ceph_deploy.new][DEBUG ] Writing monitor keyring to ceph.mon.keyring...
==> omv1: [ceph_deploy.new][DEBUG ] Writing initial config to ceph.conf...
==> omv1: [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
==> omv1: [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy install --no-adjust-repos omv1.example.com
==> omv1: [ceph_deploy.install][DEBUG ] Installing stable version hammer on cluster ceph hosts omv1.example.com
==> omv1: [ceph_deploy.install][DEBUG ] Detecting platform for host omv1.example.com ...
==> omv1: [omv1.example.com][DEBUG ] connected to host: omv1.example.com 
==> omv1: [omv1.example.com][DEBUG ] detect platform information from remote host
==> omv1: [omv1.example.com][DEBUG ] detect machine type
==> omv1: [ceph_deploy.install][INFO  ] Distro info: Fedora 23 Twenty Three
==> omv1: [omv1.example.com][INFO  ] installing ceph on omv1.example.com
==> omv1: [omv1.example.com][INFO  ] Running command: yum -y -q install ceph ceph-radosgw
==> omv1: [omv1.example.com][WARNING] Yum command has been deprecated, redirecting to '/usr/bin/dnf -y -q install ceph ceph-radosgw'.
==> omv1: [omv1.example.com][WARNING] See 'man dnf' and 'man yum2dnf' for more information.
==> omv1: [omv1.example.com][WARNING] To transfer transaction metadata from yum to DNF, run:
==> omv1: [omv1.example.com][WARNING] 'dnf install python-dnf-plugins-extras-migrate && dnf-2 migrate'
==> omv1: [omv1.example.com][WARNING] 
==> omv1: [omv1.example.com][INFO  ] Running command: ceph --version
==> omv1: [omv1.example.com][DEBUG ] ceph version 0.94.5 (9764da52395923e0b32908d83a9f7304401fee43)
==> omv1: [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
==> omv1: [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy mon create-initial
==> omv1: [ceph_deploy.mon][DEBUG ] Deploying mon, cluster ceph hosts omv1
==> omv1: [ceph_deploy.mon][DEBUG ] detecting platform for host omv1 ...
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][DEBUG ] detect platform information from remote host
==> omv1: [omv1][DEBUG ] detect machine type
==> omv1: [ceph_deploy.mon][INFO  ] distro info: Fedora 23 Twenty Three
==> omv1: [omv1][DEBUG ] determining if provided host has same hostname in remote
==> omv1: [omv1][DEBUG ] get remote short hostname
==> omv1: [omv1][DEBUG ] deploying mon to omv1
==> omv1: [omv1][DEBUG ] get remote short hostname
==> omv1: [omv1][DEBUG ] remote hostname: omv1
==> omv1: [omv1][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
==> omv1: [omv1][DEBUG ] create the mon path if it does not exist
==> omv1: [omv1][DEBUG ] checking for done path: /var/lib/ceph/mon/ceph-omv1/done
==> omv1: [omv1][DEBUG ] done path does not exist: /var/lib/ceph/mon/ceph-omv1/done
==> omv1: [omv1][INFO  ] creating keyring file: /var/lib/ceph/tmp/ceph-omv1.mon.keyring
==> omv1: [omv1][DEBUG ] create the monitor keyring file
==> omv1: [omv1][INFO  ] Running command: ceph-mon --cluster ceph --mkfs -i omv1 --keyring /var/lib/ceph/tmp/ceph-omv1.mon.keyring
==> omv1: [omv1][DEBUG ] ceph-mon: mon.noname-a 192.168.123.100:6789/0 is local, renaming to mon.omv1
==> omv1: [omv1][DEBUG ] ceph-mon: set fsid to 94276740-7431-49ab-80da-a37325d2d020
==> omv1: [omv1][DEBUG ] ceph-mon: created monfs at /var/lib/ceph/mon/ceph-omv1 for mon.omv1
==> omv1: [omv1][INFO  ] unlinking keyring file /var/lib/ceph/tmp/ceph-omv1.mon.keyring
==> omv1: [omv1][DEBUG ] create a done file to avoid re-doing the mon deployment
==> omv1: [omv1][DEBUG ] create the init path if it does not exist
==> omv1: [omv1][DEBUG ] locating the `service` executable...
==> omv1: [omv1][INFO  ] Running command: /usr/sbin/service ceph -c /etc/ceph/ceph.conf start mon.omv1
==> omv1: [omv1][DEBUG ] === mon.omv1 === 
==> omv1: [omv1][DEBUG ] Starting Ceph mon.omv1 on omv1...
==> omv1: [omv1][WARNING] Running as unit run-3493.service.
==> omv1: [omv1][DEBUG ] Starting ceph-create-keys on omv1...
==> omv1: [omv1][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.omv1.asok mon_status
==> omv1: [omv1][DEBUG ] ********************************************************************************
==> omv1: [omv1][DEBUG ] status for monitor: mon.omv1
==> omv1: [omv1][DEBUG ] {
==> omv1: [omv1][DEBUG ]   "election_epoch": 2, 
==> omv1: [omv1][DEBUG ]   "extra_probe_peers": [], 
==> omv1: [omv1][DEBUG ]   "monmap": {
==> omv1: [omv1][DEBUG ]     "created": "0.000000", 
==> omv1: [omv1][DEBUG ]     "epoch": 1, 
==> omv1: [omv1][DEBUG ]     "fsid": "94276740-7431-49ab-80da-a37325d2d020", 
==> omv1: [omv1][DEBUG ]     "modified": "0.000000", 
==> omv1: [omv1][DEBUG ]     "mons": [
==> omv1: [omv1][DEBUG ]       {
==> omv1: [omv1][DEBUG ]         "addr": "192.168.123.100:6789/0", 
==> omv1: [omv1][DEBUG ]         "name": "omv1", 
==> omv1: [omv1][DEBUG ]         "rank": 0
==> omv1: [omv1][DEBUG ]       }
==> omv1: [omv1][DEBUG ]     ]
==> omv1: [omv1][DEBUG ]   }, 
==> omv1: [omv1][DEBUG ]   "name": "omv1", 
==> omv1: [omv1][DEBUG ]   "outside_quorum": [], 
==> omv1: [omv1][DEBUG ]   "quorum": [
==> omv1: [omv1][DEBUG ]     0
==> omv1: [omv1][DEBUG ]   ], 
==> omv1: [omv1][DEBUG ]   "rank": 0, 
==> omv1: [omv1][DEBUG ]   "state": "leader", 
==> omv1: [omv1][DEBUG ]   "sync_provider": []
==> omv1: [omv1][DEBUG ] }
==> omv1: [omv1][DEBUG ] ********************************************************************************
==> omv1: [omv1][INFO  ] monitor: mon.omv1 is running
==> omv1: [omv1][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.omv1.asok mon_status
==> omv1: [ceph_deploy.mon][INFO  ] processing monitor mon.omv1
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][INFO  ] Running command: ceph --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.omv1.asok mon_status
==> omv1: [ceph_deploy.mon][INFO  ] mon.omv1 monitor has reached quorum!
==> omv1: [ceph_deploy.mon][INFO  ] all initial monitors are running and have formed quorum
==> omv1: [ceph_deploy.mon][INFO  ] Running gatherkeys...
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Checking omv1 for /etc/ceph/ceph.client.admin.keyring
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][DEBUG ] detect platform information from remote host
==> omv1: [omv1][DEBUG ] detect machine type
==> omv1: [omv1][DEBUG ] fetch remote file
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Got ceph.client.admin.keyring key from omv1.
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Have ceph.mon.keyring
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Checking omv1 for /var/lib/ceph/bootstrap-osd/ceph.keyring
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][DEBUG ] detect platform information from remote host
==> omv1: [omv1][DEBUG ] detect machine type
==> omv1: [omv1][DEBUG ] fetch remote file
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-osd.keyring key from omv1.
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Checking omv1 for /var/lib/ceph/bootstrap-mds/ceph.keyring
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][DEBUG ] detect platform information from remote host
==> omv1: [omv1][DEBUG ] detect machine type
==> omv1: [omv1][DEBUG ] fetch remote file
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-mds.keyring key from omv1.
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Checking omv1 for /var/lib/ceph/bootstrap-rgw/ceph.keyring
==> omv1: [omv1][DEBUG ] connected to host: omv1 
==> omv1: [omv1][DEBUG ] detect platform information from remote host
==> omv1: [omv1][DEBUG ] detect machine type
==> omv1: [omv1][DEBUG ] fetch remote file
==> omv1: [ceph_deploy.gatherkeys][DEBUG ] Got ceph.bootstrap-rgw.keyring key from omv1.
==> omv1: [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
==> omv1: [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy osd prepare omv1.example.com:/srv/ceph/osd
==> omv1: [ceph_deploy.osd][DEBUG ] Preparing cluster ceph disks omv1.example.com:/srv/ceph/osd:
==> omv1: [omv1.example.com][DEBUG ] connected to host: omv1.example.com 
==> omv1: [omv1.example.com][DEBUG ] detect platform information from remote host
==> omv1: [omv1.example.com][DEBUG ] detect machine type
==> omv1: [ceph_deploy.osd][INFO  ] Distro info: Fedora 23 Twenty Three
==> omv1: [ceph_deploy.osd][DEBUG ] Deploying osd to omv1.example.com
==> omv1: [omv1.example.com][DEBUG ] write cluster configuration to /etc/ceph/{cluster}.conf
==> omv1: [omv1.example.com][INFO  ] Running command: udevadm trigger --subsystem-match=block --action=add
==> omv1: [ceph_deploy.osd][DEBUG ] Preparing host omv1.example.com disk /srv/ceph/osd journal None activate False
==> omv1: [omv1.example.com][INFO  ] Running command: ceph-disk -v prepare --fs-type xfs --cluster ceph -- /srv/ceph/osd
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mkfs_options_xfs
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mkfs_options_xfs
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_mount_options_xfs
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_fs_mount_options_xfs
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=osd_journal_size
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_cryptsetup_parameters
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_key_size
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-conf --cluster=ceph --name=osd. --lookup osd_dmcrypt_type
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Preparing osd data dir /srv/ceph/osd
==> omv1: [omv1.example.com][INFO  ] checking OSD status...
==> omv1: [omv1.example.com][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
==> omv1: [ceph_deploy.osd][DEBUG ] Host omv1.example.com is now ready for osd use.
==> omv1: [ceph_deploy.conf][DEBUG ] found configuration file at: /root/.cephdeploy.conf
==> omv1: [ceph_deploy.cli][INFO  ] Invoked (1.5.25): /bin/ceph-deploy osd activate omv1.example.com:/srv/ceph/osd
==> omv1: [ceph_deploy.osd][DEBUG ] Activating cluster ceph disks omv1.example.com:/srv/ceph/osd:
==> omv1: [omv1.example.com][DEBUG ] connected to host: omv1.example.com 
==> omv1: [omv1.example.com][DEBUG ] detect platform information from remote host
==> omv1: [omv1.example.com][DEBUG ] detect machine type
==> omv1: [ceph_deploy.osd][INFO  ] Distro info: Fedora 23 Twenty Three
==> omv1: [ceph_deploy.osd][DEBUG ] activating host omv1.example.com disk /srv/ceph/osd
==> omv1: [ceph_deploy.osd][DEBUG ] will use init type: sysvinit
==> omv1: [omv1.example.com][INFO  ] Running command: ceph-disk -v activate --mark-init sysvinit --mount /srv/ceph/osd
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Cluster uuid is 94276740-7431-49ab-80da-a37325d2d020
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster=ceph --show-config-value=fsid
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Cluster name is ceph
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:OSD uuid is 43a20af4-7a45-4f70-a0dd-baf906d58026
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Allocating OSD id...
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring osd create --concise 43a20af4-7a45-4f70-a0dd-baf906d58026
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:OSD id is 0
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Initializing OSD...
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o /srv/ceph/osd/activate.monmap
==> omv1: [omv1.example.com][WARNING] got monmap epoch 1
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph-osd --cluster ceph --mkfs --mkkey -i 0 --monmap /srv/ceph/osd/activate.monmap --osd-data /srv/ceph/osd --osd-journal /srv/ceph/osd/journal --osd-uuid 43a20af4-7a45-4f70-a0dd-baf906d58026 --keyring /srv/ceph/osd/keyring
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:47.525213 7f66f9f0c880 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:49.412257 7f66f9f0c880 -1 journal FileJournal::_open: disabling aio for non-block journal.  Use journal_force_aio to force use of aio anyway
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:49.413018 7f66f9f0c880 -1 filestore(/srv/ceph/osd) could not find 23c2fcde/osd_superblock/0//-1 in index: (2) No such file or directory
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:49.813515 7f66f9f0c880 -1 created object store /srv/ceph/osd journal /srv/ceph/osd/journal for osd.0 fsid 94276740-7431-49ab-80da-a37325d2d020
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:49.813566 7f66f9f0c880 -1 auth: error reading file: /srv/ceph/osd/keyring: can't open /srv/ceph/osd/keyring: (2) No such file or directory
==> omv1: [omv1.example.com][WARNING] 2015-12-28 14:36:49.813683 7f66f9f0c880 -1 created new key in keyring /srv/ceph/osd/keyring
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Marking with init system sysvinit
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Authorizing OSD key...
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring auth add osd.0 -i /srv/ceph/osd/keyring osd allow * mon allow profile osd
==> omv1: [omv1.example.com][WARNING] added key for osd.0
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:ceph osd.0 data dir is ready at /srv/ceph/osd
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Creating symlink /var/lib/ceph/osd/ceph-0 -> /srv/ceph/osd
==> omv1: [omv1.example.com][WARNING] DEBUG:ceph-disk:Starting ceph osd.0...
==> omv1: [omv1.example.com][WARNING] INFO:ceph-disk:Running command: /usr/sbin/service ceph --cluster ceph start osd.0
==> omv1: [omv1.example.com][DEBUG ] === osd.0 === 
==> omv1: [omv1.example.com][WARNING] create-or-move updating item name 'osd.0' weight 0.04 at location {host=omv1,root=default} to crush map
==> omv1: [omv1.example.com][DEBUG ] Starting Ceph osd.0 on omv1...
==> omv1: [omv1.example.com][WARNING] Running as unit run-4116.service.
==> omv1: [omv1.example.com][INFO  ] checking OSD status...
==> omv1: [omv1.example.com][INFO  ] Running command: ceph --cluster=ceph osd stat --format=json
==> omv1: [omv1.example.com][INFO  ] Running command: systemctl enable ceph
==> omv1: [omv1.example.com][WARNING] ceph.service is not a native service, redirecting to systemd-sysv-install
==> omv1: [omv1.example.com][WARNING] Executing /usr/lib/systemd/systemd-sysv-install enable ceph
==> omv1:     cluster 94276740-7431-49ab-80da-a37325d2d020
==> omv1:      health HEALTH_WARN
==> omv1:             64 pgs stuck inactive
==> omv1:             64 pgs stuck unclean
==> omv1:      monmap e1: 1 mons at {omv1=192.168.123.100:6789/0}
==> omv1:             election epoch 2, quorum 0 omv1
==> omv1:      osdmap e5: 1 osds: 1 up, 1 in
==> omv1:       pgmap v6: 64 pgs, 1 pools, 0 bytes data, 0 objects
==> omv1:             0 kB used, 0 kB / 0 kB avail
==> omv1:                   64 creating
==> omv1: you can now use ceph with the rbd command

real    4m14.815s
user    0m2.239s
sys    0m0.288s
james@computer:/tmp/tmp.DhD$ vscreen root@omv1
[root@omv1 ~]# ceph status
    cluster 94276740-7431-49ab-80da-a37325d2d020
     health HEALTH_OK
     monmap e1: 1 mons at {omv1=192.168.123.100:6789/0}
            election epoch 2, quorum 0 omv1
     osdmap e5: 1 osds: 1 up, 1 in
      pgmap v9: 64 pgs, 1 pools, 0 bytes data, 0 objects
            6642 MB used, 33190 MB / 39832 MB avail
                  64 active+clean
[root@omv1 ~]#

You’ll notice that the ceph status command initially returned a warning when run with OMV, but you’ll notice that if I logged in to the machine and ran it a second time, everything is now happy. I imagine this is because it takes a moment to converge on a healthy state.

I’ve also just released a new Fedora-23 vagrant box, which is what this example uses. Naturally if you haven’t already downloaded the box, this example will take a bit longer to run the first time. The download will cost you about 865M and will happen automatically if you’re missing the box.

Happy hacking!

James

PS: If you’re curious what that cdtmpmkdir is, it’s here:

$ type cdtmpmkdir 
cdtmpmkdir is a function
cdtmpmkdir () 
{ 
    cd `mktemp --tmpdir -d tmp.XXX`
}

Matching arbitrary URL’s to custom Firefox profiles

We’re constantly clicking on all sorts of different URL’s throughout the day. These clickable links appear in webpages (including in “web apps” like gmail) in mail clients like Evolution, in terminals such as GNOME-terminal, and any other GTK+ app on your GNU/Linux desktop. I wanted to perform custom actions when arbitrary URL’s are clicked, including running certain links in separate Firefox profiles. There are a bunch of different steps you have to do to get this working, but it should be easy to follow along. I’m doing all of this on Fedora 23, but it should work on other GNU/Linux environments.

Firefox profiles:

Firefox supports multiple profiles in the same user session so that different users can share a session, or so that a single user can separate tasks into different environments. I’m interested in the latter use case. To add a new profile it’s recommended to close firefox completely, but I didn’t find this to be necessary. When I do close firefox, I like to surprise it with a:

killall -9 firefox

which will also cause any unsaved data in your browser to be lost. To create a new profile, now run firefox with -P:

firefox -P

This will open up a friendly dialog where you can add a new profile. After you’ve done this, my dialog now looks like:

A view of my firefox profiles as shown by running: firefox -P

A view of my firefox profiles as shown by running: firefox -P

to test that it is working, run firefox from the command line:

$ firefox https://ttboj.wordpress.com/
$ firefox -P ghttps https://github.com/purpleidea/
$ firefox https://twitter.com/#!/purpleidea
$ firefox -P ghttps https://www.gnu.org/philosophy/free-sw.html

You should get two separate sessions, where the commands with -P ghttps should be in your new “ghttps” session (or whatever you named it). Internet searches seem to report that some users can’t run two sessions at the same time without including the --no-remote option, but I didn’t seem to need it. YMMV.

Firefox launcher:

When you run firefox, it usually runs /usr/bin/firefox. I want a more clever launcher, so I’ve created a new bash script named ~/bin/firefox which is part of my path. The contents of this file are:

#!/bin/bash
# run firefox from a terminal, without being attached to it; similar to nohup
# thanks to @purpleidea from https://ttboj.wordpress.com/
# TODO: a better argv parser and more flexible url matching semantics
# NOTE: first close firefox and make a new profile with `firefox -P`, then set:
protocol='ghttps' # name of fake protocol
profile='ghttps' # name of your new firefox profile
prefix='https://example.com/'
argv=("$@")
argc=$(($# - 1))
url=''
if [ $argc -ge 0 ]; then
    url=${argv[$argc]}
    # avoid recursion!
    if [[ "$url" == "$protocol"* ]]; then
        url="https${url:${#protocol}}" # s/ghttps/https/
        argv[$argc]=$url # store it
    fi
fi

#echo $url
#echo "`date` ${argv[*]}" >> /tmp/firefox.log

# use a separate profile for special links
if [ "$url" != "" ] && [[ "$url" == "$prefix"* ]]; then
    # firefox with profile
    { `/usr/bin/firefox -P "$profile" "${argv[@]}" &> /dev/null`; } &
else
    # normal firefox
    { `/usr/bin/firefox "${argv[@]}" &> /dev/null`; } &
fi

Make sure the file is executable with chmod u+x ~/bin/firefox and in your $PATH. Now whenever you run the firefox program, it will automatically run firefox with a profile that corresponds to the pattern of URL that you matched. Feel free to improve this script with a more comprehensive pattern to profile correspondence mechanism.

Default applications:

Whenever any URL is clicked within GNOME, there is a central “Default Applications” setting which decides what application to run. My settings dialog for this control now looks like:

What the GNOME Settings->Details->Default Applications dialog looks like after I made one small change.

What the GNOME Settings->Details->Default Applications dialog looks like after I made the change.

I had to change the “Web” handler to be a “MyFirefox” instead of the previous default of “Firefox”. Those applications are listed in .desktop files which exist on your system. The system wide firefox desktop file is located at: /usr/share/applications/firefox.desktop and although the path to the executable in this file does not set a directory prefix, it unfortunately does not seem to obey my shell $PATH which includes ~/bin/. If you know how to set this so .desktop files include ~/bin/ in their search, then I’d really appreciate it if you left me a comment!

To work around the $PATH issue, I copied the above file into ~/.local/share/applications/firefox.desktop and edited it so that the three Exec= commands include a path prefix of /home/james/bin/. I also renamed the Name= entry so that it was visually obvious that a different .desktop file was in use. This will replace the firefox launcher throughout your desktop and as well in the “Default Applications” menu.

An excerpt of my file showing only the changed sections now looks like:

[Desktop Entry]
Name=MyFirefox
Comment=Browse the Web better
Exec=/home/james/bin/firefox %u
Actions=new-window;new-private-window;

[Desktop Action new-window]
Name=Open a New Window
Exec=/home/james/bin/firefox %u

[Desktop Action new-private-window]
Name=Open a New Private Window
Exec=/home/james/bin/firefox --private-window %u

Changing the name is optional, but it might be instructional for you.

It’s important that you not rename the file, because only files which are listed in one of the GNOME mime list files will show up in the “Default Applications” chooser. Once you’ve created the file, you can check in these settings to ensure it’s set as the default.

I forget if you need to close firefox, and logout and then back in to your GNOME session for this to work, so if things aren’t working perfectly by now, ensure you’ve done that once. You can test this by clicking on a link in your terminal and checking to see that it opens the correct firefox.

Redirecting internal firefox links:

Everything should now be working perfectly, until you click on a link within firefox which doesn’t redirect to your shell firefox wrapper. We want this to be seamless, so we’ll have to hack into the firefox API for that. Thankfully there’s a plugin which already does this for us, so we can use it and avoid getting our hands too dirty! It’s called “Redirector“. Install it.

Once installed, there is a settings dialog which can add some pattern matching for us. I set up a basic pattern that corresponds to what I wrote in my ~/bin/firefox shell script. Here’s a screenshot:

A screenshot from the firefox Redirector plugin.

A screenshot from the firefox Redirector plugin.

You can conveniently import and export your redirects from the plugin, and so I’ve included the corresponding .json equivalent for your convenience.

Does everything look correct? Take a second to have a closer look. You might think that I made a typo in the “Redirect to:” field”. There’s no such protocol as ghttps you say? That’s good news, because its use was intentional.

Custom protocol handlers:

Running an external command in response to certain links is what allows them to open external programs such as mail clients, PDF viewers, and image viewers. While some of these functions have been pulled into the browser, the need is still there and this is what we’ll use to trigger our firefox shell script. It’s actually important that we make an external system call because otherwise there would no way for a link in the default browser profile to open in browser profile number two. Running any such command is only possible with a custom or unique protocol. You might be used to seeing https:// for URL’s, but since these are captured by the browser as native links, we need something different. This is what the ghttps:// that we mentioned above is for.

To add a custom protocol, you’ll need to dive into your browsers internal settings. You can do this by typing about:config in the URL bar. You’ll then need to right-click and add four new settings. These are the settings I added:

(string)  network.protocol-handler.app.ghttps; /home/james/bin/firefox
(boolean) network.protocol-handler.expose.ghttps; false
(boolean) network.protocol-handler.external.ghttps; true
(boolean) network.protocol-handler.warn-external.ghttps; true

Please note that the leading values (in brackets) are the types that you’ll need to use. Omit the semicolons, those separate the key and the corresponding value you should give it. You’ll naturally want to use the correct path to your firefox script.

For reasons unknown to me, it’s required to set these variables, but the protocol handler still requires that you manually verify this once. To do this, I have provided a sample link to my blog using the fake ghttps protocol:

ghttps://ttboj.wordpress.com/

When you click on it the first time, you should be prompted with a confirmation dialog that asks you to reconfirm that you’re okay running this protocol, and the path to the executable. Browse to the ~/bin/firefox and click “Remember my choice for ghttps links”. The dialog looked like this:

You should only need to deal with this dialog once!

You should only need to deal with this dialog once!

If you’re using a different protocol, can you make a simple HTML file and open it up in your browser:

<html>
<a href="ghttps://ttboj.wordpress.com/">ghttps://ttboj.wordpress.com/</a>
</html>

At this point you may need to restart firefox. Your new protocol handler is now installed! Enjoy automatically handling special URL’s.

Bugs:

There is one small usability bug which you might experience. If the link that should pattern match out to the protocol exists with a target=_blank (open in new window attribute) then once you’ve activated the link, there will be a leftover blank firefox window to close. This is a known issue in firefox which occurs with other handlers as well. If anyone can work on this issue and/or find me a link to the ticket number, I’d appreciate it.

Curiosity:

The curious might wonder what my use-case is. I’ve been forced to use the most unpleasant online google document system. I’ve decided that I didn’t want to share my regular browser profile with this software, but I wanted URL integration to feel seamless, since people like to send the unique document URL’s around by email and chat. The document URL’s usually follow a pattern of:

https://docs.google.com/a/example.com/some-garbage-goes-here…

where example.com is the domain your organization uses. By setting the above string as the bash firefox $prefix variable, and with a similar pattern in the redirector plugin, you can ensure that you’ll always get documents opening up in browser sessions connected to the correct google account! This is useful if you have multiple google accounts which you wish to automatically segregate to avoid having to constantly switch between them!

Future work:

It would be great to consolidate the patterns as expressed in the Redirector database and the firefox bash script. It would probably make sense to generate a json file that both tools can use. Additional work to extend my bash script would be necessary. Patches welcome!

It would be convenient if there was an easy setup script to automate through the myriad of steps that I took you through to get this all working. If someone can provide a simple bash equivalent, I would love to have it.

Conclusion:

I hope you enjoyed this article and this set of techniques! Hopefully you can appreciate how stringing five different techniques together can produce something useful. A big thank you goes out to SlashLife from the #firefox IRC channel. This user pointed me to the Redirector plugin which was critical for intercepting arbitrary URL’s in firefox.

Happy Hacking,

James

PS: I’d like to apologize for not posting anything in the last three months! I’ve been busy hacking on something big, which I’ll hope to announce soon. Stay tuned and thanks for reading this far!