rsync | The Technical Blog of James

As I previously wrote, I’ve been busy with Vagrant on Fedora with libvirt, and have even been submitting, patches and issues! (This “closed” issue needs solving!) Here are some of the tricks that I’ve used while hacking away.

Default provider:

I should have mentioned this in my earlier article but I forgot: If you’re always using the same provider, you might want to set it as the default. In my case I’m using vagrant-libvirt. To do so, add the following to your .bashrc:

export VAGRANT_DEFAULT_PROVIDER=libvirt

This helps you avoid having to add --provider=libvirt to all of your vagrant commands.

Aliases:

All good hackers use shell (in my case bash) aliases to make their lives easier. I normally keep all my aliases in .bash_aliases, but I’ve grouped them together with some other vagrant related items in my .bashrc:

alias vp='vagrant provision'
alias vup='vagrant up'
alias vssh='vagrant ssh'
alias vdestroy='vagrant destroy'

Bash functions:

There are some things aliases just can’t do. For those situations, a bash function might be most appropriate. These go inside your .bashrc file. I’d like to share two with you:

Vagrant logging:

Sometimes it’s useful to get more information from the vagrant command that is running. To do this, you can set the VAGRANT_LOG environment variable to info, or debug.

function vlog {
	VAGRANT_LOG=info vagrant "$@" 2> vagrant.log
}

This is usually helpful for debugging a vagrant issue. When you use the vlog command, it works exactly as the normal vagrant command, but it spits out a vagrant.log logfile into the working directory.

Vagrant sftp:

Vagrant provides an ssh command, but it doesn’t offer an sftp command. The sftp tool is fantastically useful when you want to quickly push or pull a file over ssh. First I’ll show you the code so that you can practice reading bash:

# vagrant sftp
function vsftp {
	[ "$1" = '' ] || [ "$2" != '' ] && echo "Usage: vsftp  - vagrant sftp" 1>&2 && return 1
	wd=`pwd`		# save wd, then find the Vagrant project
	while [ "`pwd`" != '/' ] && [ ! -e "`pwd`/Vagrantfile" ] && [ ! -d "`pwd`/.vagrant/" ]; do
		#echo "pwd is `pwd`"
		cd ..
	done
	pwd=`pwd`
	cd $wd
	if [ ! -e "$pwd/Vagrantfile" ] || [ ! -d "$pwd/.vagrant/" ]; then
		echo 'Vagrant project not found!' 1>&2 && return 2
	fi

	d="$pwd/.ssh"
	f="$d/$1.config"

	# if mtime of $f is > than 5 minutes (5 * 60 seconds), re-generate...
	if [ `date -d "now - $(stat -c '%Y' "$f" 2> /dev/null) seconds" +%s` -gt 300 ]; then
		mkdir -p "$d"
		# we cache the lookup because this command is slow...
		vagrant ssh-config "$1" > "$f" || rm "$f"
	fi
	[ -e "$f" ] && sftp -F "$f" "$1"
}

This vsftp command will work from anywhere in the vagrant project root directory or deeper. It does this by first searching upwards until it gets to that directory. When it finds it, it creates a .ssh/ directory there, and stores the proper ssh_config stubs needed for sftp to find the machines. Then the sftp -F command runs, connecting you to your guest machine.

You’ll notice that the first time you connect takes longer than the second time. This is because the vsftp function caches the ssh_config file store operation. You might want to set a different timeout, or disable it altogether. I should also mention that this script searches for a Vagrantfile and .vagrant/ directory to identify the project root. If there is a more “correct” method, please let me know.

I’m particularly proud of this because it was a fun (and useful) hack. This function, and all the above .bashrc material is available here as an easy download.

NFS folder syncing/mounting:

The vagrant-libvirt provider supports one way folder “synchronization” with rsync, and NFS mounting if you want two-way synchronization. I would love if libvirt/vagrant-libvirt had support for real folder sharing via QEMU guest agent and/or 9p. Getting it to work wasn’t trivial. Here’s what I had to do:

Vagrantfile:

Define your “synced_folder” so that you add the :mount_options array:

config.vm.synced_folder "nfs/", "/tmp/nfs", :nfs => true, :mount_options => ['rw', 'vers=3', 'tcp']

When you specify anything in the :mount_options array, it erases all the defaults. The defaults are: vers=3,udp,rw. Obviously, choose your own directory paths! I’d rather see this use NFSv4, but it would have taken slightly more fighting. Please become a warrior today!

Firewall:

Firewalling is not automatic. To work around this, you’ll need to run the following commands before you use your NFS mount:

# systemctl restart nfs-server # this as root, the rest will prompt
$ firewall-cmd --permanent --zone public --add-service mountd
$ firewall-cmd --permanent --zone public --add-service rpc-bind
$ firewall-cmd --permanent --zone public --add-service nfs
$ firewall-cmd --reload

You should only have to do this once, but possibly each reboot. If you like, you can patch those commands to run at the top of your Vagrantfile. If you can help fix this issue permanently, the bug is here.

Sudo:

The use of sudo to automatically edit /etc/exports is a neat trick, but it can cause some problems:

You get internet karma from me if you can help permanently solve any of those three problems.

Package caching:

As part of my deployments, Puppet installs a lot of packages. Repeatedly hitting a public mirror isn’t a very friendly thing to do, it wastes bandwidth, and it’s slower than having the packages locally! Setting up a proper mirror is a nice solution, but it comes with a management overhead. There is really only one piece of software that manages repositories properly, but using pulp has its own overhead, and is beyond the scope of this article. Because we’re not using our own local mirror, this won’t allow you to work entirely offline, as the metadata still needs to get downloaded from the net.

Installation:

As an interim solution, we can use vagrant-cachier. This plugin adds synced_folder‘s to the apt/yum cache directories. Sneaky! Since this requires two-way sync, you’ll need to get NFS folder syncing/mounting working first. Luckily, I’ve already taught you that!

To pass the right options through to vagrant-cachier, you’ll need this patch. I’d recommend first installing the plugin with:

$ vagrant plugin install vagrant-cachier

and then patching your vagrant-cachier manually. On my machine, the file to patch is found at:

~/.vagrant.d/gems/gems/vagrant-cachier-0.5.0/lib/vagrant-cachier/provision_ext.rb

Vagrantfile:

Here’s what I put in my Vagrantfile:

# NOTE: this doesn't cache metadata, full offline operation not possible
config.cache.auto_detect = true
config.cache.enable :yum		# choose :yum or :apt
if not ARGV.include?('--no-parallel')	# when running in parallel,
	config.cache.scope = :machine	# use the per machine cache
end
config.cache.enable_nfs = true	# sets nfs => true on the synced_folder
# the nolock option is required, otherwise the NFSv3 client will try to
# access the NLM sideband protocol to lock files needed for /var/cache/
# all of this can be avoided by using NFSv4 everywhere. die NFSv3, die!
config.cache.mount_options = ['rw', 'vers=3', 'tcp', 'nolock']

Since multiple package managers operating on the same cache can cause locking issues, this plugin lets you decide if you want a shared cache for all machines, or if you want separate caches. I know that when I use the --no-parallel option it’s safe (enough) to use a common cache because Puppet does all the package installations, and they all happen during each of their first provisioning runs which are themselves sequential.

This isn’t perfect, it’s a hack, but hacks are fun, and in this case, very useful. If things really go wrong, it’s easy to rebuild everything! I still think a few sneaky bugs remain in vagrant-cachier with libvirt, so please find, report, and patch them if you can, although it might be NFS that’s responsible.

Puppet integration:

Integration with Puppet makes this all worthwhile. There are two scenarios:

You’re using a puppetmaster, and you care about things like exported resources and multi-machine systems.
You don’t have a puppetmaster, and you’re running standalone “single machine” code.

The correct answer is #1. Standalone Puppet code is bad for a few reasons:

It neglects the multi-machine nature of servers, and software.
You’re not able to use useful features like exported resources.
It’s easy to (unknowingly) write code that won’t work with a puppetmaster.

If you’d like to see some code which was intentionally written to only work without a puppetmaster (and the puppetmaster friendly equivalent) have a look at my Pushing Puppet material.

There is one good reason for standalone Puppet code, and that is: bootstrapping. In particular, bootstrapping the Puppet server.

DNS:

You need some sort of working DNS setup for this to function properly. Whether that involves hacking up your /etc/hosts files, using one of the vagrant DNS plugins, or running dnsmasq/bind is up to you. I haven’t found an elegant solution yet, hopefully not telling you what I’ve done will push you to come up with something awesome!

The :puppet_server provisioner:

This is the vagrant (puppet agent) provisioner that lets you talk to a puppet server. Here’s my config:

vm.vm.provision :puppet_server do |puppet|
	puppet.puppet_server = 'puppet'
	puppet.options = '--test'    # see the output
end

I’ve added the --test option so that I can see the output go by in my console. I’ve also disabled the puppet service on boot in my base machine image, so that it doesn’t run before I get the chance to watch it. If your base image already has the puppet service running on boot, you can disable it with a shell provisioner. That’s it!

The :puppet provisioner:

This is the standalone vagrant (puppet apply) provisioner. It seems to get misused quite a lot! As I mentioned, I’m only using it to bootstrap my puppet server. That’s why it’s a useful provisioner. Here’s my config:

vm.vm.provision :puppet do |puppet|
	puppet.module_path = 'puppet/modules'
	puppet.manifests_path = 'puppet/manifests'
	puppet.manifest_file = 'site.pp'
	# custom fact
	puppet.facter = {
		'vagrant' => '1',
	}
end

The puppet/{modules,manifests} directories should exist in your project root. I keep the puppet/modules/ directory fresh with a make script that rsync’s code in from my git directories, but that’s purely a personal choice. I added the custom fact as an example. All that’s left is for this code to build a puppetmaster!

Building a puppetmaster:

Here’s my site.pp:

node puppet {	# puppetmaster

	class { '::puppet::server':
		pluginsync => true,	# do we want to enable pluginsync?
		storeconfigs => true,	# do we want to enable storeconfigs?
		autosign => ['*'],	# NOTE: this is a temporary solution
		#repo => true,		# automatic repos (DIY for now)
		start => true,
	}

	class { '::puppet::deploy':
		path => '/vagrant/puppet/',	# puppet folder is put here...
		backup => false,		# don't use puppet to backup...
	}
}

As you can see, I include two classes. The first one installs and configures the puppetmaster, puppetdb, and so on. The second class runs an rsync from the vagrant deployed /vagrant/puppet/ directory to the proper directories inside of /etc/puppet/ and wherever else is appropriate. Each time I want to push new puppet code, I run vp puppet and then vp <host> of whichever host I want to test. You can even combine the two commands into a one-liner with && if you’d like.

The puppet-puppet module:

This is the hairy part. I’ve always had an issue with this module. I recently ripped out support for a lot of the old puppet 0.24 era features, and I’ve only since tested it on the recent 3.x series. A lot has changed from version to version, so it has been hard to follow this and keep it sane. There are serious improvements that could be made to the code. In fact, I wouldn’t recommend it unless you’re okay hacking on Puppet code:

https://github.com/purpleidea/puppet-puppet

Most importantly:

This was a long article! I could have split it up into multiple posts, gotten more internet traffic karma, and would have seemed like a much more prolific blogger as a result, but I figured you’d prefer it all in one big lot, and without any suspense. Prove me right by leaving me a tip, and giving me your feedback.

Happy hacking!

James

PS: You’ll want to be able to follow along with everything in this article if you want to use the upcoming Puppet-Gluster+Vagrant infrastructure that I’ll be releasing.

PPS: Heh heh heh…

PPPS: Get it? Vagrant -> Oscar the Grouch -> Muppets -> Puppet

Theron Conrey writes about using:

BitTorrent Sync as Geo-Replication for Storage

We got a chance to talk about this idea at Linuxcon. I’m not entirely convinced there aren’t some problem edge cases with this solution, but I think it will be hard to tell as long as the BitTorrent sync library is proprietary. I did come up with a special case of Theron’s idea that I believe could work well.

The special case uses the optimization that the synchronization (or file transferring) is unidirectional. This avoids any coherency complications involved if both sides were to write to the same file. Combined with the BitTorrent protocol, this does what normal torrent usage does, except with BitTorrent sync, we’re looking at a folder full of files.

What kind of synchronization would benefit from this model? Repository mirroring! This is exactly a folder full of files, but going in only one direction. Instead of yum or deb mirrors each running rsync, they could use BitTorrent sync, and because of the large amount of available upload bandwidth usually available on these mirrors, “seeding”, wouldn’t be a problem, and the worldwide pool would synchronize faster.

Can we apply this to user mirroring, net installers, and machine updating? Absolutely. I believe someone has already looked into the updates scenario, but it didn’t progress for some reason. The more convincing case is still the server geo-replication of course.

Obviously, using glusterfs with puppet-gluster to host the mirrors could be a good fit. You might not even need to use any gluster replication when you have built-in geo-replication via other mirrors.

If someone works up the open source BitTorrent parts, I’m happy to hack together the puppet parts to turn this into a turn-key solution for mirror hosts.

Hope you liked this idea.

Happy hacking,

James

The Technical Blog of James

Technical articles, writeups and discussion!

Tag Archives: rsync

Vagrant vsftp and other tricks

Bittorent sync for repository mirroring

sparse rsync magic