Learn the Basic Concepts First
Everyone gets in a hurry to learn a new technology, so they are trying to learn all the tricks and fancy stuff right away and do not build a solid foundation first.
Big data books. Hadoop is all about the data. Learn big data concepts before looking at Hadoop in any depth. These books will build core data concepts around Hadoop.
- Disruptive Possibilities: How Big Data Changes Everything,- This is a must read for anyone getting started in Big Data.
- Big Data Now, 2012 Edition: Easy read and good insights on Big Data. Some of this content on companies is out of date, but there is a lot of valuable information here so this is still a good read.
- Big Data, by Nathan Marx - The book does a great job of teaching core concepts, fundamentals and provides a great perspective of the data architecture of Hadoop. This book will build a solid foundation and helps you understand the Lambda architecture. You may need to get this book from MEAP if it has not released yet. (http://www.manning.com/marz/)
- Hadoop for Dummies, by Tim Jones - Easy introduction to learn basic concepts and terms.
- Big Data for Dummies, - This is a very gentle introduction to Big Data, concepts and technologies surrounding it.
Three Defining Whitepapers to Read
These papers are excellent papers to build fundamental knowledge around Hadoop and Hive. Even though they are a few years old, the concepts and perspective discussed are excellent. They will provide foundational insights into Hadoop.
Professional Training
These papers are excellent papers to build fundamental knowledge around Hadoop and Hive. Even though they are a few years old, the concepts and perspective discussed are excellent. They will provide foundational insights into Hadoop.
Professional Training
Professional training is the quickest and easiest way to learn core concepts, fundamentals and get some hands on experience. I do work at Hortonworks, but there are some specific reasons I recommend Hortonworks University. The reason is Hortonworks is all open source so you are not learning someone's proprietary or open-proprietary distribution. By learning from 100% open source at Hortonworks you can learn from the open source base, which is then applicable to any distribution. Also, Hadoop 2 has YARN which is a key foundational component and Hortonworks is driving the innovation and roadmap around YARN.
Additional Resources
Once you get the fundamental concepts down you will be wanting to learn in more detail. The two books below are good for taking that next step. However, I recommend reading them in parallel and bouncing back and forth. The reason is each has areas that I believe they do a better job on. Each book has sections that I prefer and using them together was very helpful for me.
- Apache Hadoop Yarn (not released yet), by Arun Murthy, Jeffrey Markham, Vinod Vavilapalli, Doug Eadline
- Hadoop The Definitive Guide (3rd Edition), by Tom White
- Hadoop Operations, by Eric Sammer
Getting Hands on Experience and Learning Hadoop in Detail
A great way to start getting hands on experience and learning Hadoop through tutorials, videos and demonstrations is with the Hortonworks Sandbox. The Hortonworks sandbox is designed for beginners, so it is an excellent platform for learning and skill development. The tutorials, videos and demonstrations will be updated on a regular basis. The sandbox is available in a Virtualbox or VMware virtual machine. An additional 4GB of RAM and 2GB of storage is recommended for either of the virtual machines. If you have a laptop that does not have a lot of memory you can go to the VM settings and cut the RAM for the VM down to about 1.5 - 2GB of RAM. This is likely to impact performance of the VM but it will help it at least run on a minimal configured laptop.
Other books to consider:
Other books to consider:
- Programming Hive, by Edward Capriolo, Dean Wampler, ...
- Programming Pig, by Alan Gates
- http://engineering.linkedln.com/hadoop
- http://engineering.twitter.com
What is Hadoop?
- Mark Madsen - http://www.insideanalysis.com/2012/12/what-hadoop-is-what-is-isnt/
- Jim Walker - http://www.youtube.com/watch?v=j6toE6Ke7k4
- Expert panel - http://www.infoq.com/articles/HadoopVirtualPanel
Have fun and I look forward to any additional recommendations.
Hi,
ReplyDeletecan you pls tell me how essential is Java knowledge in learning BigData & Hadoop.
Can you pls let me know if Hadoop admin also need Java expertise.
Thanks!
Srikanth