I'm really looking forward to presenting at Hadoop Summit again. Presenting at Hadoop Summit in Amsterdam was awesome and San Jose is looking like the best ever. I'll be helping get Summit off to a great start with "Apache Hadoop Essentials: A Technical Understanding for Business User" and then closing the conference with "A Reference Architecture for ETL 2.0". You may even see me at the Dev Cafe giving tours around the Hadoop Sandbox and Savanna. Here are two of the workshops/presentations I will be presenting at:
A Technical Understanding for Business Users - Joining me will be Manish Gupta("The Wizard of Hadoop", or affectionally known as "Manish Hadoopta" because he can play Hadoop like a piano and can make Hadoop magic.
This fast-paced one-day course will provide attendees with a technical overview of Apache Hadoop. Discussions will include understanding Hadoop from a data perspective, design strategies, data architecture, core Hadoop fundamentals, data ingestion options and an introduction to Hadoop 2.0. Hands-on labs will give business users a deeper understanding of Apache Hadoop using real world use cases to help provide the understanding of the power of Hadoop. We will be using the new Hortonworks Sandbox 1.3. The Hortonworks Sandbox is one of the best ways for enthusiasts new to Hadoop to get started. The Hortonworks Sandbox:
- Uses the Hortonworks Data Platform 1.3
- See SQL "IN" Hadoop with Apache Hive 0.11, offering 50x improvement in performance for queries.
- Learn Ambari the management interface of choice for HDP and OpenStack (Savanna).
- Available with a VMware, Virtualbox or Hyper-V virtual machine.
- A great way for someone to start learning how to work with a Hadoop cluster.
- Lots of excellent tutorials, including:
- Hello Hadoop World
- HCatalot, Basic Pig and Hive commands
- Using Excel 2013 to Analyze Hadoop Data
- Data Processing with Hive
- Loading Data into Hadoop
- Visualize Website Clickstream Data
A Reference Architecture for ETL 2.0 - Presenting with George Vetticaden (Hortonworks Solution Architect), we will be bringing the "Power of George" to Hadoop Summit. :) ETL is such a big part of successful Hadoop implementations, George and I thought we'd help wrap the conference with some best practices, words of wisdom and reference architectures around Hadoop ETL.
More and more organizations are moving their ETL workloads to a Hadoop based ELT grid architecture. Hadoop`s inherit capabilities, especially it`s ability to do late binding addresses some of the key challenges with traditional ETL platforms. In this presentation, attendees will learn the key factors, considerations and lessons around ETL for Hadoop. Areas such as pros and cons for different extract and load strategies, best ways to batch data, buffering and compression considerations, leveraging HCatalog, data transformation, integration with existing data transformations, advantages of different ways of exchanging data and leveraging Hadoop as a data integration layer. This is an extremely popular presentation around ETL and Hadoop.