Bittorent sync for repository mirroring

Theron Conrey writes about using:

BitTorrent Sync as Geo-Replication for Storage

We got a chance to talk about this idea at Linuxcon. I’m not entirely convinced there aren’t some problem edge cases with this solution, but I think it will be hard to tell as long as the BitTorrent sync library is proprietary. I did come up with a special case of Theron’s idea that I believe could work well.

The special case uses the optimization that the synchronization (or file transferring) is unidirectional. This avoids any coherency complications involved if both sides were to write to the same file. Combined with the BitTorrent protocol, this does what normal torrent usage does, except with BitTorrent sync, we’re looking at a folder full of files.

What kind of synchronization would benefit from this model? Repository mirroring! This is exactly a folder full of files, but going in only one direction. Instead of yum or deb mirrors each running rsync, they could use BitTorrent sync, and because of the large amount of available upload bandwidth usually available on these mirrors, “seeding”, wouldn’t be a problem, and the worldwide pool would synchronize faster.

Can we apply this to user mirroring, net installers, and machine updating? Absolutely. I believe someone has already looked into the updates scenario, but it didn’t progress for some reason. The more convincing case is still the server geo-replication of course.

Obviously, using glusterfs with puppet-gluster to host the mirrors could be a good fit. You might not even need to use any gluster replication when you have built-in geo-replication via other mirrors.

If someone works up the open source BitTorrent parts, I’m happy to hack together the puppet parts to turn this into a turn-key solution for mirror hosts.

Hope you liked this idea.

Happy hacking,

James

Gluster Community Day, Thursday

I’m here in New Orleans hacking up a storm and getting to meet fellow gluster users IRL. John Mark Walker started off with a great “State of the GlusterFS union” style talk.

Today Louis (semiosis) gave a great talk about running glusterfs on amazon. It was highly pragmatic and he explained how he chose the number of bricks per host. The talk will be posted online shortly.

Marco Ceppi from Canonical gave a talk about juju and gluster. I haven’t had much time to look at juju, so it was good exposure. Marco’s gluster charm suffers from a lack of high availability peering, but I’m sure that is easily solved, and it isn’t a big issue. I had the same issue when working on puppet-gluster. I’ve written an article about how I solved this problem. I think it’s the most elegant solution, but if anyone has a better idea, please let me know. The solutions I used for puppet, can be applied to juju too. Marco and I talked about porting puppet-gluster to ubuntu. We also talked about using puppet inside of juju, with a puppetmaster, but we’re not sure how useful that would be beyond pure hack value.

Joe Julian gave a talk on running a MySQL (MariaDB) on glusterfs and getting mostly decent performance. That man knows his gluster internals.

I presented my talk about puppet-gluster. I had a successful live demo, which ran over ssh+screen across the conference centre internet to my home cluster Montreal. With interspersed talking, the full deploy took about eight minutes. Hope you enjoyed it. Let me know if you have any trouble with your setup and what features you’re missing. The video will be posted shortly.

Thanks again to John Mark Walker, RedHat and gluster.org for sponsoring my trip.

Happy hacking,

James

Linuxcon day three, Wednesday

After hacking away on Monday and Tuesday and meeting fellow nerds IRL, I’ve landed even more changes to puppet-gluster. My git master branch now sits at 47 commits.

$ git clone https://github.com/purpleidea/puppet-gluster.git
Cloning into 'puppet-gluster'...
remote: Counting objects: 317, done.
remote: Compressing objects: 100% (144/144), done.
remote: Total 317 (delta 187), reused 275 (delta 148)
Receiving objects: 100% (317/317), 82.17 KiB | 12.00 KiB/s, done.
Resolving deltas: 100% (187/187), done.
$ cd puppet-gluster/
$ git log | grep '^commit' | wc -l
47
$ git log | head
commit fa3fd2eb4bab499031274e0918a40e7a99fe0086
Author: James Shubin <hidden>
Date:   Wed Sep 18 17:53:13 2013 -0400

    Added fancy volume creation.
    
    This moves the command into a separate file. This also adds temporary
    saving of stdout and stderr to /tmp for easy debugging of command
    output.

As you can see above, volume creation is now “fancier” and more robust. In case things go wrong, it’s easy to get fast access to gluster command line output (saved in /tmp/), and the volume creation commands are individually stored in your puppet-gluster working directory. Usually this is /var/lib/puppet/tmp/gluster/, and each volume creation command is in the volume subdirectory.

I also met gluster expert Joe Julian. He’s been recently hired at Rackspace. Congratulations Joe. We talked about puppet and gluster, and is very knowledgeable about gluster internals and PRN source diving.

I was interviewed by Aaron Delp and Brian Gracely, on The Cloudcast. These two gentlemen are a pleasure to sit and chat with. Check out their podcast. We talked about puppet, gluster, puppet-gluster and how to dive in. Feel free to comment or email me if you have any questions about something that we didn’t cover in the interview.

All week, I’ve been hacking along side Jayneil Dalal in the speaker room. He was kind enough to give me a Beagle Bone black! Where will its hack potential take you? Two features which are particularly useful are on-board Ethernet, and 2GB of flash storage. He’s at the conference showing off some Minnow boards. They’ve got an Intel atom chip on board if you need something a little beefier.

I’m giving my puppet-gluster talk tomorrow (Thursday) here at Linuxcon! I hope you can make it. I’ll even have a live demo. Until then,

Happy hacking,

James

Linuxcon day two, Tuesday

Continuing on from yesterday, I’ve met even more interesting people. I chatted with Dianne Mueller about some interesting ideas for gluster+openshift. More to come on that front soon. Hung out with Jono Bacon and talked a bit about puppet-gluster on Ubuntu. If there is interest in the community for this, please let me know. Thanks to John Mark Walker and RedHat for sponsoring me and introducing me to many of these folks. Hello to all the others that I didn’t mention.

On the hacking side of things, I added proper xml parsing, and a lot of work on fancier firewalling to puppet-gluster. At the moment, here’s how the firewall support works:

  1. Initially, each host doesn’t know about the other nodes.
  2. Puppet runs and each host exports host information to each other node. This opens up the firewall for glusterd so that the hosts can peer.
  3. Now that we know which hosts are in a common pool, we can open up the firewall for each volume’s bricks. Since the volume has not yet been started (or even created) we can’t know which ports are needed, so all incoming ports are permitted from other gluster nodes.
  4. Once the volume is created, and started, the TCP port information will be available, and can be consumed as facts. These facts then refine the previously defined firewall rules, to only allow the needed ports.
  5. Your white-listed firewall setup is now complete.
  6. For users who wish to avoid using this module to configure your firewall, you can set shorewall => false in your gluster::server class. If you want to specify the allowed ip access control manually, that is possible too.

I hope you find this useful. I know I do. Let me know, and

Happy Hacking,

James

Linuxcon day one, Monday

I’m here in New Orleans at Linux Con, hacking on puppet-gluster and talking to lots of interesting folks. I’ve met gluster hacker Theron Conrey, and my host John Mark Walker, Fedora and Raspberry Pi experts Spot and Ruth Suehle, and many others too.

The hotel is very nice. The bathroom sink has two taps of course, but both of them are hot. The New Orleans heat is probably the cause of this.

I’m hacking at full speed to get some new features and testing in before my talk on Thursday. I’ve been reworking the simple firewall support in my puppet module. For those that want automated and correct firewall configuration, expect some improvements soon.

I also pushed some work on property management for gluster volumes. This commit adds a list of all available gluster properties. The patch is still missing type information for many of them, because I haven’t yet tested each one, but if there are some you’d like to use, please let me know. This is easy to patch.

More code is landing soon. Don’t be afraid to contact me if you’re not sure how to get started with puppet gluster, or if there’s a use case that I am not currently solving.

Happy Hacking,

James

PS: Sorry I published this late!

New puppet-gluster features before Linuxcon

Hey there,

I’ve done a bit of puppet-gluster hacking lately to try to squeeze some extra features and testing in before Linuxcon. Here’s a short list:

If there are features or bugs that you’d like to see added or removed (respectively) please let me know ASAP so that I can try to get something ready for you before my Linuxcon talk. I also don’t have any hardware RAID or physical hardware to test external storage partitions (bricks) on. If you have any that you can donate or let me hack on for a while, there are some features I’d like to test. Contact me!

I’ve got a few more things queued up too, but you’ll have to wait and see. Until then,

Happy hacking,

James

Puppet-Gluster and me at Linuxcon

John Mark Walker, (from Redhat) has been kind enough to invite me to speak at the Linuxcon Gluster Workshop in New Orleans. I’ll be speaking about puppet-gluster, giving demos, and hopefully showing off some new features. I’m also looking forward to meeting up with gluster expert Joe Julian.

If there are features that puppet-gluster is missing, or you have a use case that I haven’t covered, please let me know, and I’ll try to work on it for you ahead of the conference. If you want to meet up for some puppet-gluster help, or to hack on code, I’ll be around from the 16th to the 20th of September. My talk is on the 19th.

Special thanks to John Mark and Redhat for sponsoring this trip. Without them, none of this would be possible.

Happy hacking,

James