I've recently started an effort to migrate some of our infrastructure - primarily VDI and Development - over to a hyper-converged architecture. I've looked at several commercial options - some bolt-ons for existing hypervisors that aggregate storage to help create hyper-convergence (e.g. VMware VSAN), along with some of the products designed to do hyper-convergence from the ground up (e.g. Nutanix).
Some of these products are pretty compelling. Nutanix has some fantastic features, and provides their community edition software to get you hooked. But I love a good challenge, so I decided to see if I could build my own hyper-converged infrastructure with the following goals: reuse existing hardware, and pay nothing for the software.
I should take a moment and make sure that everyone understands that, while I'm a huge fan of Free and Open Source Software, I'm not opposed to paying for things when it makes sense. I manage an IT department, and there are plenty of times when I make the choice to go commercial with products. But sometimes I'm just itching for a challenge, and this was one of those times.
From a commercial products standpoint, reusing hardware is hit-or-miss. There are a few commercial options out there that are software-only options - either installs for existing systems, or virtual appliances that use raw LUN mappings to aggregate local disk space on separate servers into a single pool available to a group of servers. VMware's VSAN product is probably the most well-known item of this ilk, but many of the other hyper-converged "solutions" are really just getting you the last piece of that: storage aggregation. Some suppliers, like Nutanix, aren't willing to provide you a software-only version to run on existing hardware - if you're going to move beyond the limited Community Edition, you're going to buy either their hardware or someone else's "approved" appliance. I understand the reasoning, and I may still end up with a turn-key hardware solution, but I still felt the need to find something that would work with existing hardware for no cost.
Interestingly, the product I found doesn't even bill itself as a solution for hyper-converged architecture. We'll cover those reasons in a minute, but a web search for "open source hyper converged" doesn't show any mentions of this product on the first page, and even page two is more of a tangential reference than it is a direct hit.
The product I settled on is oVirt. oVirt is an open source project backed by Red Hat (and the basis for RHEV), and is billed as Virtualization Management application. The home page says they do "manage virtual machines, storage, and virtualized networks." I certainly understand that a tool that manages those three things does not automatically equate to a hyper-converged platform, but hear me out. In version 3.5, I believe, they added integrated support for GlusterFS into oVirt, and there's where I think we turn the corner toward an open source product that really does hyper-convergence.
If you're not familiar with GlusterFS, it's another open source project (also backed by Red Hat), for a distributed network filesystem. It gives you a wide variety of options for how to distribute data - striping, replication, distribution, and dispersion, and several combinations thereof - to take advantage of direct-attached storage on individual compute nodes. It then gives you a single namespace where you can get access to that data, and provide both native GlusterFS access along with NFS (either integrated or via Ganesha).
With the integration of Gluster into oVirt, oVirt truly becomes a viable hyper-converged platform. KVM is used as the hyper visor, networking is done via OpenvSwitch, and Gluster is the distributed storage platform. A very user-friendly web GUI ties most of this together (there are some things you have to drop down to the command linen to do, but not many), and you can use existing hardware, with integrated storage, and eliminate the need for a SAN.
Deploying oVirt with Gluster was very easy. I choose CentOS 7 as the base platform and installed it on 7 hardware nodes. The first node I designated as the oVirt management node (I believe support for doing this in a VM is coming, soon), and installed the oVirt packages necessary to get the management console up and running. After the base O/S install was done on the rest of the nodes, the oVirt management console uses SSH to connect to those nodes and deploy all of the necessary software packages, including the host agent. Once the hosts were all connected to oVirt, I was able to use the Storage Domain creation wizard in the web management console to create a GlusterFS volume, and it was automatically attached to all of the nodes. You do need some familiarity with Gluster, particularly what type of volume you want, stripe width, etc., but other than that it is pretty straight-forward (Gluster itself is pretty easy to set up).
My test bed for this consisted of some very, very old Dell PE1955 blades. I also have a C5000 chassis around with 8 "sleds" that I will probably use. I'm in the process of procuring some SSDs for those systems - the CPUs and RAM are decent for the job of running VMs for various tasks, but the I/O on a single HDD - even spread across 6 nodes with Gluster - just doesn't cut it. Eventually I may move into the Dell FX2 chassis for hardware and continue to use oVirt to manage that platform. Just thinking about that...
Storage is really the key differentiation in many of the hyper-converged platforms. Most either use common hypervisors (KVM is very popular) or support bringing your own (usually with KVM or ESXi). Most feature the same rough set of networking and virtual switching components. But how they handle storage is really where they get their features. Nutanix boasts the concept of highly-available storage, but couples that with "data locality" - that is, the disk for a VM is replicated across multiple nodes, but at least one of those nodes is the host on which the VM is running, and the access to the data is done completely locally, eliminating any latency due to network traffic. Furthermore, Nutanix doesn't split up the VM disk object among multiple nodes - it replicates it across multiple nodes, but the entire VM disk is present on the system where the VM is running, and can migrate/move to other nodes in the cluster. Many other products do not feature that notion of "data locality" - and often it is because the storage is being split below the object level, at the block level, and dispersed among available nodes to overcome I/O bottlenecks.
Gluster falls somewhere in between the sophistication of Nutanix and just striping data blindly - or, it can, depending on how you configure it. Using a Distributed Replicated volume in Gluster you can get very close to the way Nutanix works - the entire file is written to one node and then replicated to another node (or however many other nodes needed to meet your replication configuration). According to some anecdotes - no official documentation - in this type of configuration Gluster *should* create the disk image on the system where the I/O originates, which could be where the VM is started. However, it comes up short in two places - first, there's no guarantee that will happen, and, based on current I/O load of that system at the time, it could be initially create somewhere else. Second, Gluster doesn't have a mechanism to trigger movement of a file or set of data from one node to another. So, in Nutanix, if you shutdown your VM and start it on another node, or you live migrate your VM from one node to another, Nutanix will transparently move the disk to the storage closest to that node. Gluster (currently) will not.
Anyway, there's my initial experience with oVirt as a platform for hyper-convergence. I think it works at a base level, and I think the potential is there for it to become a true hyper-converged platform, I just hope that the folks steering the project will realize the potential and drive it in that direction.
As a final note, oVirt has some other incredibly powerful, incredibly disruptive features that could also be game-changers in giving the incumbent virtualization giants a run for their money. First, it features the capability to do Sysprep for Microsoft Windows, and similar functions in other operating systems, that let you deploy pools of VMs in an automated fashion. Think VDI. Think replacement for Citrix XenDesktop and XenApp, and VMware View and Horizon. I've been playing with that over the past week or so, and it works - really, really well. Second, it features the SPICE protocol, a competitor to PCoIP (what VMware licenses for their accelerated display support) and HDX (Citrix). I don't think SPICE is as mature as these other products, and doesn't seem to support server-side GPU acceleration the way those do, but give it a few months and that may change. Finally, it has the LDAP/AD integration, multiple domains, and multi-tenancy that make it a very powerful and highly configurable management platform for multi-tenant environments.
All for now - at least on that topic.