Tuesday, April 9, 2013

Key Links for Virtualizing Oracle


Key Links for Best Practices
Lot of the best practices for virtualizing Oracle applies to hypervisors in general, whether it's VMware or Oracle VM.  Obviously, there are specific best practices when it comes to features that are specific to either of the products.  For example, we need to create separate interfaces on the VM host (ESXi host or Oracle VM Server) to segment off management related network traffic (i.e. management related traffic to maintain a network heartbeat or the traffic to perform live migrations (vMotion in VMware)).  At a minimum, each physical host needs to have 4 physical network interface cards.  6 Network interface cards will be highly recommended.  We will create a bonded network interfaces for the following network workloads:
1.      2 NICs bonded for the public network for all oracle database related traffic
2.      2 NICs bonded for oracle private network between the RAC clusters
3.      2 NICs bonded for communication between the ESXi or Oracle VM Server host machines

All the best practices that are applicable at the VM Guest level apply to both VMware and Oracle VM.  For example, we want to enable jumbo frames on the Guest VM.  We also want to setup hugepages and disable NUMA at the Guest VM level. 

In general, we also do not want to over-commit memory or CPUs for production environments.  For databases that fit well for consolidation, we can consider over-committing memory or CPUs.

For additional information for best practices for VMware, please read the following articles.

Four key documents for virtualizing Oracle
DBA Best Practices



High Availability Guide


vCloud Suite and vCloud Networking and Security
vCloud Editions

 vCloud Networking and Security


vCenter Operations


VMware Tech Resource Center (Videos, Whitepapers, Docs)

Miscellaneous
A high level whitepaper on virtualizing Business Critical Apps on VMware

Deployment Guide, Reference Architecture, Customer case studies and white papers



VMware Network I/O Control: Architecture, Performance and Best Practices http://www.vmware.com/files/pdf/techpaper/VMW_Netioc_BestPractices.pdf


Esxtop and vscsiStats

Memory Management vSphere 5

Resource Mgmt vSphere 5 
  
Achieving a Million IOPS in a single VM with vSphere5

VMXNET3 was designed with improving performance in mind. See, VMware KB 1001805: http://kb.vmware.com/selfservice/documentLinkInt.do?micrositeID=null&externalID=1001805


Performance Evaluation of VMXNET3 Virtual Network Device can be found at: http://www.vmware.com/pdf/vsp_4_vmxnet3_perf.pdf



Preferred BIOS settings (always double check with hardware vendor, http://www.vmware.com/pdf/Perf_Best_Practices_vSphere4.1.pdf




SCSI Queue Depth - Controlling LUN queue depth throttling in VMware ESX/ESXi


 Monitor disk latency at three distinct layers of the device or HBA, the kernel or ESX hypervisor and the guest or virtual machine. 


PVSCSI Storage Performance 


Snapshot limitations and best practices to minimize problems http://kb.vmware.com/kb/1025279

Jumbo frames VMXNET3 



The vSphere 4 CPU scheduler

   
Some excellent storage links from Chris Sakac (EMC) and Vaughn Stewart (NetApp)
VNX and vSphere Techbook

VMAX and vSphere Techbook

Isilon and vSphere Best Practices Guide

Sunday, April 7, 2013

Additional Tidbits on Virtualization Best Practices

With the latest version of hypervisors;  if the right hardware is purchased (using hardware compatibility lists), the infrastructure is properly tuned (best practices are followed), and the virtual infrastructure is designed properly (referenced architectures) most environments will run with 6% overhead or less.  Most database servers do not run with 90% utilization of CPU and memory, so virtualization should be able to run 80 - 90% of the database servers in the world.

DBAs that make a blanket statement that virtualization has too much overhead, can easily be proven wrong so don't say that.  Not with today's hypervisors.  At the same time, I'm not saying every database, business application or Hadoop environment can be virtualized.  What I can say for a fact is that DBAs that are properly aligned with the business and can have a significant impact on reducing CapEX/OPex and increase operational efficiency through virtualization and leveraging the cloud are worth their weight in gold.  :)

There will be a slight increase in CPU due to the hypervisor allocating resources.  Disk latency is minimal if best practices are followed.  Be aware that you should always test these settings to make sure they increase performance and reduce overhead for your specific environment.

  • VMware uses binary translation and paravirtualzation.  Using the right network drivers is very important.
  • Consolidation through virtualization allows you to leverage your CAPEx and OPEx but never over commit production or SLA sensitive environments. Get the most of what you have to leverage your hardware just be careful with overcommitment.
  • Leverage Hardware Assist with VT and memory management.
  • Set Static High Performance.
  • Consider disabling C-states.
  • Latest versions of OS, network drivers, virtual tools, CPUs, memory all improve performance and reduce overhead in virtualization.
  • Try to stay away from CPU affinity.
  • Stay away form Direct Path I/O unless you absolutely need the additional few percentages performance it gives you.
  • Consider interrupt coalescing.  Adaptive coalescing is usually something to avoid with latency sensitive environments.
  • Try to keep all NUMA access within one socket.  Disable Node Interweaving in BIOS.
  • Horizontal scaling can be effective with virtual infrastructures and Hadoop.
  • Hadoop master servers can significantly leverage virtualization.  However, thoroughly test and make sure you have the skill set to run Hadoop in a virtualized environment. 
  • If virtualizing Hadoop data servers make sure you have throughly tested the HVE patches and they are working correctly in your Hadoop distribution.


Virtualizing Oracle and Hadoop


Charles Kim, Nitin Vengurlekar  and I are doing a number of presentations on virtualizing Oracle and Hadoop at the Collaborate 2013 conference in Denver.  I thought I'd share a few of our insights and things we will be talking about around the virtualization of Tier One platforms which include Oracle, Hadoop and Business Critical Applications. 

Virtualizing Oracle Databases  Business Applications and Hadoop
Hypervisors today are getting faster with less overhead.   When you look at benchmarks being published today on running Tier One databases and applications the overhead can get down to 6% or less.  This low level of overhead means you can run 80-90% of all database servers in a Virtual Machine.  Critical systems that require ten of thousands of IOPs, high IO throughput, and excessive CPU requirements may generate more than the 6% of overhead.  If your application suffers from performance issues today, your best bet is to stay on the physical servers until your performance issues are isolated and resolved.  The rest of the typical business critical applications and database servers can run successfully in a virtual environment.   

Rapidly Provision Oracle on Virtualized Infrastructure
Imagine a world where your system administrators can provision a fully functional Linux server that is patched with all the up-to-date kernel parameters, updated device drivers, and updated configurations in 1 hour.  The time that you make the request to the time that you get access to a server that has a fully configured Red Hat 5/6 or Oracle Linux 5/6 environment is within one hour.   On top of all this, the build is perfect every time.  This should be a reality for most companies today.

Imagine providing a fully patched Oracle database on 11.2.0.3 with PSU 5 (January 2013 PSU) applied to your customers in 1 hour.  There is no reason why this cannot be accomplished today with the infrastructure that is provided by VMware and Oracle.

Lastly, imagine provisioning RAC clusters in matter of hours.  Imagine being able to provide a fully patched 2 node/3 node /4 node Grid Infrastructure with ASM and a fully patched database within one day.  This presentation does not go into the secret sauce of being able to do this but will lead you in the right direction.   Oracle and VMware provide the means to be able to provision even the most complicated RAC infrastructure in one day.  We no longer spend weeks and even months to setup our RAC environments.

Create Golden Image Templates
The concept of creating a golden image applies at all levels of the stack.  In the end, we need to create a golden image virtual machine template.  Before we can create a golden image VM template, we need to create a golden image OS.   This does not come over night but can easily be established.  There has to be lot of collaboration between the system administrators as to standards and policies.  Furthermore, someone has to be the “owner” of the templates to make sure all the standard build is applied to the golden image template.  As we mature within the organization, we can build automation to simplify the build process and parts of the builds that require manual intervention.  The level of automation will dictate how long it takes to provision the Linux VM.  Obviously, more you automate, the less time it will take.

As DBAs, we will want to focus on creating golden images of the database eco-system.  We need to create a golden image Grid Infrastructure stack.  We also need to create a golden image database software stack.  Finally, we need to create a golden image database to deploy to all the environments.  We can automate all of the above components to simplify and reduce the amount of time to provision Oracle databases.

Building An Enterprise Virtual Platform
Setting up a VirtualBox, VMware Fusion or VMware Workstation VM is pretty simple.  However there is a big difference in skill set required to  set up a bare metal hypervisor for running a POC and/or benchmarks.   It's then another thing to design, configure and implement an enterprise virtual platform for running Tier One platforms.   They key to building an enterprise virtual platform is to follow best practices and reference architectures.  Four levels of best practices that have to be followed include:
§  Validate virtualization and software configurations with vendor hardware compatibility lists.
§  Follow recommended reference architectures.
§  Follow virtualization vendor's best practices, deployment guides and workload characterizations.
§  Review storage vendor recommendations.
§  Validate internal best practices for configuring and managing VMs.

As we build out the enterprise virtual platform, standards will need to be created and tightly controlled.  Process and procedures for virtual machine deployments will also play a big factor in how successful your virtualization journey becomes.

Why Virtualize Oracle and Hadoop Environments
   Virtual Servers offer significant advantages over Physical Servers.
   Enabling Oracle or Hadoop as a service in a public or private cloud.
   Cloud providers are making it easy to deploy platforms for POCs, dev and test environments.
   Running a Consistent, Highly Reliable Hardware Environment.
   Standardizing on a Single Common Hardware Platform (software stack).
   Virtualization is a natural step towards the cloud.
   Cloud and virtualization vendors are offering elastic solutions.

Virtualization Features
These virtualization features offer a lot of additional functionality to Oracle database servers, applications and Hadoop environments.
Faster provisioning
Live Cloning
Elasticity
Live migrations
Templates
Multi-tenancy
Live storage migrations
Distributed Resource Scheduling
Storage I/O Control
High Availability
Hot CPU and Memory add
Network I/O Control
Live Cloning
VM Replication
16Gb FC Support
Network isolation using VXLANs
Multi-VM trust zones
iSCSI Jumbo Frame Support
VM Backups
Distributed Power Management
Node Affinity/Anit-Node Affinity

We cannot go over all the virtualization features in this paper.  We do plan on reviewing all of the terms and features in our presentation.  Also at the presentation we will provide the subtle differences between the two vendors.

vMotion / Live Migration
vMotion / Live Migration, by far, is one of the biggest benefits of a virtualization infrastructure.   With this feature enabled, we can migrate an active VM to another host machine without any downtime or disruptions while maintaining application services to users.  Granted the application may experience a slight degradation in performance, there will be no data loss during the few minutes needed to move a VM to another host machine.  It will be completely transparent to the applications that the live migration (vMotion) occurred.

Imagine if you lost the network card on one of the host machines and need to take the server down for maintenance.  In the absence of Oracle VM or VMware, you would experience a complete outage in a non-virtualized world.  If you happen to be on a RAC environment, you would run your databases in a reduced capacity.  In the virtual world however, we would simply move your database server VM to another host machine, perform our maintenance, reboot your host machine and let the database server VM migrate back.  While this is happening, you would never know that it happened.

High Availability (HA)
With virtualization, we automatically adopt what is known to be HA in the virtualization world.  If the host machine crashes for any reason, we can failover the VM to a surviving host machine in the cluster.  This happens automatically unless we set CPU and/or memory limits beyond what the host machine is capable of handling.

With HA, some companies may be able to forgo on RAC licenses if they are strictly leveraging RAC for high availability.  If customers can withstand a 10-15 minute outage for the VM to  re-start on a surviving host machine, you maybe able to eliminate your RAC licenses.

Distributed Resource Management (DRS)
Leveraging the vMotion / Live Migration infrastructure, we can evenly load balance the work load of every host machine in the virtualization cluster.  If one host machine becomes over-loaded, we can move the VM or VMs to a less loaded host machine.  This automatically happens without the users experiencing any perceived outages.  We can establish affinity and anti-affinity rules to even move VMs together with other VMs.  Again, we should not be afraid to fully leverage this technology.

In our detailed paper on "Virtualizing Oracle, A Technical Deep Dive we provide detailed links on best practices for virtualizing Oracle.