In Q2 of the KB, the question is asked “Which VMware Products are currently NOT interoperable with Virtual Volumes (VVols)?” The response includes “VMware Site Recovery Manager (SRM) 5.x to 6.0.x”
In Q4 of the KB, the question is asked “Which VMware vSphere 6.0.x features are currently NOT interoperable with Virtual Volumes (VVols)?” The response includes “Array-based replication”.
So where does that leave us from a replication/DR standpoint with VVols?
From this, we can assume that because vSphere Replication works at the VM level, it really doesn’t care what the underlying storage is (NFS, VMFS, vsanDatastore or VVols). That would be a correct assumption. However there are some limitations, such as vSphere Replication 6.0 does not support VSS quiescing on Virtual Volumes. Check out the vSphere Replication 6.0 Release Notes for further information. But overall, through all of our testing, this is the only major caveat that we are currently aware of with vSphere Replication and VVols.
Therefore, vSphere Replication could be used to replicate VMs built on VVols. This allows for various use cases in the event of failures, when there is a need to recover VMs. However due to the lack of SRM support, there is no orchestration of failover. In other words, all of the registration, powering on, and re-IP’ing of virtual machines to bring up VM at a destination DR site would have to be scripted. Sure, it could be done, but its a headache which SRM resolves for you. Not forgetting SRM’s other abilities such running test failovers and creation of run-books. Lots of nice features.
This then brings us on to the lack of support for “array-based replication”. In a recent blog post on VVols by Eric Siebert over at HP, the “No support for storage array replication yet” is also called out. Eric goes on to state:
This is a big one especially for larger enterprises that rely on storage array replication for BC/DR. Storage replication support is not part of the current VASA 2.0 specification so SRM and vMSC are not supported with VVols, for many that’s a show stopper right there. However despite VMware not supporting storage array replication yet some vendor implementations will or do support it now so that could be a workaround but you still can’t use it with SRM or vMSC until VMware builds it into the VASA spec.
VASA is the vSphere APIs for Storage Awareness, and integral part of VVols. You can read more on the basics of VVols here. vMSC is basically stretched clustering.
I reached out to a few of our other partners to get their take on this, as I was pretty sure replication wouldn’t be an issue for these partners, but rather the orchestration would be the problem.
First I spoke to David Glynn, the Dell EqualLogic front-man for all things VVols. The Dell EqualLogic arrays include a host side software component called the Virtual Storage Manager that plugs into VMware’s vCenter Server and enables almost all of the management task on the array to be done from within vSphere. David pointed out that through their VSM plugin, they can do things such as promote a replicated VMFS volume from secondary to primary and register VMs on that volume. However other task such as powering on VMs and re-IP’ing of VMs would need to be orchestrated elsewhere, or scripted.
During their investigations with VVols, they discovered a few nuances if they wanted to implement a disaster-recovery (DR) solution for VVol based VMs. One such nuance is the fact that the storage container needs to have the same identifiers (name, id, etc.) on both the primary and DR arrays if there was a desire to bring up VMs on the DR site. They continue to research whether this can gracefully be achieved, but they don’t have an integrated DR story with the current VVols release.
And this brings up a key point. Our storage array partners should have the final say of what is supported and what isn’t with respect to array based replication. I have stated our stance here on SRM, but once again to be clear, it is not supported with VVols. However this does not mean to say that DR cannot be achieved through our storage partners. The point I’m making is that you need to ask them whether it can be done or not.
David went on to say that if replication of VVols was to be configurable on the EQL array, and if something happened to the source VVols, replication could be reversed and VVols could be restored to the original container using such a mechanism – the same as can be achieved today with traditional volumes. I guess in some ways this could be considered like a remote snapshot technology. Recovering from snapshots — whether local or remote — is a very valid use case for replicating VVols. This could be done manually as well as via scripting. All one would need to do is reverse the data flow in the replication. Again however, vendors would need to supply specific details on how to achieve this, not VMware.
I also had a conversation with Joel Kaufman who heads up the tech marketing team over at NetApp. The conversation took a very similar pattern. NetApp can use their Snapmirror technology for replicating VVols. SnapMirror works at the storage container level today. A storage container maps to one or more Flexvols on the NetApp side – for more about storage containers, check this link out. However Joel stated that replication will be more granular (to the VVol level) in the future. Again, Joel highlighted that in the case of a localized failure, they can do a local recovery using snapshots. If it is a remote recovery, a lot of additional nuances begin to appear. Just like David at DELL EQL tells us, a lot of the original information is needed to pull everything back – this is non-trivial.
What should be noted from all of this is that the array vendors are much closer to this than VMware are. Although we state in the above KB article that “array based replication” is not supported, it’s really about what our partners can do. While it might be useful to use it as a recovery step in the case of a local disk failure on the original system (and replicate back VVol data to the original array from the remote array), it is not really ready to be used in any sort of DR situation just yet. Our storage array/VVol partners are better armed to respond to these sorts of queries for the most part.
So, given the above, is replication, without the ability to do various bits of orchestration (promote replica volume, register VMs, power on VMs, re-IP VMs) is any better than using a snapshot?
This is how Joel responded when I put that question to him. Snapshots and replication have different functions and are based on the locality of the ‘disaster’ and the recovery event. There are plenty of situations where replication for DR with a manual recovery (manual mount of storage, register VM’s, boot, etc.) works and is good enough to meet the recovery time objective (RTO) of the business. The key is making sure that the RTO conforms to the expected SLA (Service Level Agreement) of doing it. That being said, automation/orchestration makes it a very repeatable and typically faster process all around. The key is to understand the use cases for using each, and where it applies.
I can see where Joel is coming from. Imagine a total datastore corruption. Customer has been replicating and snapping at the remote end. Now they want to just get their data back, and have it be logically usable. Yes, there is going to be some heavy lifting to get it back, but at the end of the day, as long as its recoverable, it might be just what the customer wants.
Hopefully that explains the reasoning behind the replication statements in the above KB. I hope the distinction can be drawn between what we at VMware supports and what the storage partners may or may not support. My advice for now, is to get the specific details on replication use cases from the storage vendors.
Once we have SRM for orchestration fully supported with VVols, we will have a good replication/DR story also. Thanks to Eric, David and Joel for their valuable contributions.
If you are a VVol partner, and wish to share you replication/DR story, please leave a comment.