Virtual Machine


Virtual Machine (VM) hosting provides VMs so that your organization's IT staff can run dedicated and customized Linux or Windows systems. This allows your IT staff to focus on your computing needs without the worry of purchasing and maintaining hardware resources.

Your VM will live in UFIT's secure private cloud which leverages:

  • Multiple enterprise-class data centers
  • Secure enterprise-class network - public or UF private IP space available
  • Highly available synchronously replicated Storage Area Network (SAN)
  • Highly available VMware Enterprise vSphere 6.5 environment
  • Enterprise-class backup solution to provide disaster recovery

UFIT Provides

Everything up to the hypervisor (virtualization layer). This includes all physical resources such as computing hardware, networking, storage, data center resources, and hypervisor software. UFIT also provides access to allow you to connect to your VM for management purposes such as console access, configure CD media, power on/off and VM Tools installation.

UFIT provides 7 days of daily backups to be used to recover VMs in the event of a disaster.

Customer Provides

You provide IT staff to install, configure, and maintain all software on your VM - OS and Application software. This includes maintenance of proper licensing for any software installed on your VM. Additionally, your staff will be required to handle all monitoring and backups of your VM. While UFIT monitors the health of the underlying hypervisor systems we do not provide VM guest OS or Application monitoring for your hosted VM. Additionally, your staff will be responsible for working with UFIT Network Services to maintain network ACLs pertaining to your VM's IP address(es). Your staff will also be responsible for responding to and working with UFIT's office of Information Security and Compliance.

Data Center Architecture

The UFIT VM Hosting infrastructure provides a highly available and redundant architecture that facilitates the capability to automatically shift and recover VM workloads across physical data centers.

Definitions

Physical Data Center

  • SSRB – Space Sciences Research Building, located on main campus.
  • UFDC – University of Florida Data Center, located on east campus.

Availability Zone

An availability zone is a logical construct comprised of independent physical hardware to provide an independent fault domain.

  • AZ1 – Availability Zone 1
  • AZ2 – Availability Zone 2

Legacy Network Fabric

The legacy network fabric consists of 4 routers, 2 in each data center, redundantly connected that provides all networks to both AZs.

UFIT is in the process of deploying our next generation network fabric that will consist of redundant dedicated hardware for each AZ.

Storage Cluster

A highly available single physical storage array.

Storage MetroCluster

Two physically separate, synchronously replicated storage clusters in an active passive pair that provide automatic failover between data centers.

Compute Node

A single physical server class computer.

Pod

Physical enclosure that consists of one or more compute node(s).

Compute Cluster

Logical grouping of compute nodes with hypervisor software installed to form a highly available compute cluster.

VM Availability Deployment Recommendations

For applications with a single box architecture, you would want to place your VM into either of the AZs and NOT pin the VM to a data center.

For applications with a highly available architecture, you would want to place at least one VM in each AZ and for additional redundancy, you would “pin” the VMs to one of the data centers within the AZ (EX: VM1 = AZ1 SSRB, VM2 = AZ2 UFDC).

Infrastructure Failure Scenarios

There are a number of ways for the data center infrastructure to fail. Here we will describe the most common occurrences we've seen and detail the recovery model for each one. The highly available service architecture allows some failures to be recovered automatically. In the event a failure cannot be recovered automatically UFIT will manually work to resolve the issue.

Legacy Network Outage

The entirety of the legacy network is unavailable.

All VMs across all AZs will become unavailable via network interfaces. The VMs will continue to run and no automated infrastructure recovery actions will be taken.

Legacy Network Network(s) Outage

A network or networks within the legacy network become unavailable.

All resources using the network(s) will become unavailable via network interfaces. The VMs will continue to run and no automated infrastructure recovery actions will be taken. Resources in other AZs will remain unaffected.

Single Storage Cluster Outage

The active cluster fails, all resources using the storage cluster will become 'stunned' or freeze for approximately 30 seconds while the storage paths fail over to the standby storage array of the metrocluster located in the other data center. Most VMs will remain powered on and experience no 'outage'. Resources in other AZs will remain unaffected.

The passive cluster fails, VMs will experience no issues and no automated infrastructure recovery actions will be taken.

Storage MetroCluster Outage

All VMs using the storage MetroCluster will power off. No automated infrastructure recovery actions will be taken. Resources in other AZs will remain unaffected.

Single AZ Corrupt Data

All VMs whose data was corrupted will be replicated across the mirrored storage. The VMs will have to be rebuilt or restored from backup. Resources in other AZs will remain unaffected.

Compute Node(s) Outage

All VMs using the compute node will power off. The VMs will be automatically started on unaffected infrastructure hardware within 5 minutes.

Single Compute Pod Outage

All VMs using the compute pod in both AZs will power off (1/4 of the compute in both AZs). The VMs will be automatically started on unaffected infrastructure hardware within 5 minutes.

Compute Cluster Outage

All VMs using the compute cluster in the AZ will power off. No automated infrastructure recovery actions will be taken. Resources in other AZs will remain unaffected.

Data Center Outage

All VMs running in the affected data center will be powered off in both AZs. The VMs will be automatically started on unaffected infrastructure hardware in the other data center within 15 minutes. Resources in other data centers will remain unaffected.

Multi Data Center Outage

All VMs will be unavailable. No automated infrastructure recovery actions will be taken.