Wednesday, August 28, 2013

VMWorld from a Hadoop Perspective

It's been an excellent week at VMWorld.  I've been focusing on virtualizing Hadoop and business critical applications.  My highlights:
  •  Announcing the book "Successfully Virtualizing Business Critical Oracle Databases on VMware", that I am writing with Charles Kim (ViscosityNA), Darryl Smith (EMC) and Steven Jones (VMware).
  • Will soon be announcing a new Hadoop book I will be writing.
  • Spent a lot of time this week with VMware big data engineers and experts.  Enjoyed the vExperts reception last night.  Had some great conversations around virtualizing Hadoop.
  • Presenting best practices on virtualizing Hadoop, Oracle and business critical applications.
  • Presenting on Virtualizing Mission Critical Oracle RAC with vSphere and vCOPs.  This presentation is going to show VMware admins how to deploy Oracle Database as a Service without DBAs.
VMware's goal with the Software Defined Data Center (SDDC) is to take customers to 100% virtualization. VMware's vSAN and NSX will allow virtualization from compute to now include virtualization of networks and storage.  VMware used to be about customers being 70% virtualized and moving the line to 75% or 80% virtualized.  Now focused on moving customer to 100% virtualized which means aggressively virtualizing Business Critical Applications (Oracle, SAP, etc) and Hadoop.  The SDDC is still a goal and more software pieces have to be put in to accomplish this goal.
  • VMware's vSAN supports virtual storage directly in the hypervisor. 
  • VMware's NSX is one of the biggest areas of interest.  NSX is very strategic for VMware because it allows the virtualization of the entire network stack.  NSX is to networking what ESXi is to virtualizing hardware resources.  With NSX, switching, routing, bridging and firewalls are all part of the hypervisor.  Here is the design pattern (source VMware). 



Additional Highlights

  • VMWorld has gotten a lot bigger with an estimated 22,500 attendees.
  • Large vendor area had a lot of energy.  Spunk had the coolest t-shirts by far.
  • Sharkk case seemed to be the coolest case for the iPad mini.  Seen a lot more iPad minis than iPads among the VMware jet set.

Hadoop at VMWorld
Hadoop presentations all start out discussing the benefits of Hadoop, use cases and then VMware's strategy around virtualizing Hadoop (Serengeti and Big Data Extensions).  Lots of use cases around the volume of semi structured and unstructured data.  Example: a single GE jet engine produces 10 TB of data in hour. 90 Petabytes per year.
  • GE looking at all their fridges, all machines they develop to call home for repairs and maintenance. 
  •  GE focusing on early detection of faults, common model failures and product engineering support. 
As you would expect polling of audiences showed almost no knowledge of Hadoop.  

Key benefits of virtualizing Hadoop from presentations:
  •  Fast provisioning of data nodes.
  • Workload consolidation
  • High Availability
  • Auto elasticity high resource utilization
  • True multi-tenancy
  • Promoting elastic compute with virtualizing data nodes.
  •  Leveraging virtual networks.
  •  Leveraging ability to control noisy neighbors with things like storage and network I/O control.
VMware vCenter now has a plugin for Hadoop (called Big Data Extensions).
Emphasizing if you virtualize you need to use Serengiti for deployments if virtualizing Hadoop because Serengeti understands VMware vCenter.

VMware is calling the Hadoop Virtual Extensions (HVE) their Big Data Extensions.

Fed-Ex showed how they are using scale-out NAS with Hadoop.
  • Talked about how from their perspective if the network is fast enough, data locality is not really that important. 
  • Talked about using Isilon storage with Hadoop.
Identifed Inc. did an overview of their Hadoop experience.
  • Started out using AWS.  They were using 200 VMs and it was costing them about $40k a month.  When they started running 24/7 the cost went up another $20-40k per month.  They found that performance from AWS was very spikey.
  •  Moved on premise with Serengeti to reduce costs.  Moved to SupeNap in La Vegas.  They saved $20k a month by moving off of AWS.
  • They got their ROI within two months of getting off of AWS from a cost perspective.
  •  They decided to mix physical and virtual.  Virtualized all master servers and stayed physical with data nodes.
  •  They used a Fat Twin platform.
  •  Use anti-affinity rules for master servers.  Especially zookeeper and journal nodes.
  • They used mixed storage.  Used flash for OS on nodes, and for data nodes local storage.
  • They are now exploring virtualizing their data nodes and separating compute and data.  So tasktrackers will be separate from datanodes.  They want to have elastic compute.
  • They do not have anyone that is a Hadoop administrator.  They have different developers rotate into the infrastructure team for 3-6 months.  Then when developers rotate back to development teams they keep their permissions and can manage their Hadoop clusters within the individual developer teams.
VMware announced vSphere 5.5, here are a few highlights:
  • With ESXi 5.5, the hypervisor supports up of 320 logical cores (5.1 supports 160 logical cores).
  • Up to 4TB of memory for an ESXi host (5.1 supports 2TB of memory).
  • Fault tolerance can now support up to four vCPUs.   Means VMware will be pushing for this method to achieve HA with Hadoop master servers versus the new Hadoop HA features in Hadoop 2.0.
  • NUMA Nodes per host 16 (was 8)
  • Things coming:  Auto elastic Hadoop, Support of YARN in future, Support of HBase in future.
  •  vSphere 5.5 now supports application high availability.  This supports application recovery within a VM.
  •  Project Serengeti" tools support Hadoop deployments.  Not sure if this will be in the first release of 5.5 or not.
  • VMDK maximum size  to 62 TB.
  •  Misc:
    • No change in the pricing of vSphere editions.
    • 4 new features (click on the link for many details): AppHA, Reliable Memory, Flash Read Cache and Big Data Extensions
    • Latency-sensitivity feature for applications like very high performance computing and stock trading apps
    • vSphere Hypervisor (free) has no physical memory limit anymore (was 32 GB)
    • PCI hotplug support for SSD
    • VMFS heap size improvements
    • 16 GB End to end Fibre channel. So 16 GB from host to switch and 16 GB from switch to SAN
    • Support for 40 Gpbs NICs Enhanced IPv6 support
    • Enhancements for CPU C-states. This reduces power consumption.
    • Expanded vGPU support: In vSphere 5.1 VMware only supports NVIDIA GPU. 
    • Support for the Ivy Bridge-EP Xeon E5 v2 processors (Intel) and the Opteron 3300,4300 and 6300 processors (Advanced Micro Devices). 
    • The ability to vMotion a virtual machine between different GPU vendors is also supported. If hardware mode is enabled in the source host and GPU does not exist in the destination host, vMotion will fail and will not attempt a vMotion. added Microsoft Windows Server 2012 guest clustering support
    • AHCI Controller Support which enables Mac OS guests to use IDE CDROM drives. AHCI is an operating mode for SATA.


No comments:

Post a Comment