Saturday, April 5, 2014

Open Source Driving Innovation of Enterprise Hadoop


In the last seven months we have seen a tremendous level of innovation and maturity in the enterprise Hadoop platform.   Hortonwork's HDP 2.0 and HDP 2.1 releases are showing the tremendous innovation being driven by open source today.  This innovation is significantly improving the enterprise capabilities of Hadoop and is changing the landscape of Hadoop.  It is difficult for proprietary releases of Hadoop to compete with the hundreds of thousands of lines of code being written by the Hadoop open source community.  Organizations ranging from Microsoft to Yahoo are adding their expertise and knowledge to the open source community.   We are seeing proprietary and open source/proprietary solutions of Hadoop be put under tremendous pressure by the innovation of open source and seeing  Hadoop distributions that are not 100% open source begin to disappear.

With HDP 2.0 and 2.1 there are a number of game changing capabilities added to Hadoop.   These new releases have added comprehensive capabilities in areas such as scalability, multi-tenancy, performance, security, data lifecycle management, data governance, encryption, interactive query, high availability and fault tolerance. Key  additions include:
HDP 2.0:
  • YARN - a distributed data operating system supporting applications with different run time. characteristics.  YARN also adds scalability and improved fault tolerance to Hadoop.
  • NameNode High Availability.
  • Hadoop scalability to 10,000+ nodes.
  • New releases of Hadoop frameworks in key areas such as Hive and HBase. 
HDP 2.1:
  • Interactive query capability in Hadoop.  The Stinger project has increased the performance of interactive queries by 100 times with Hive optimization, container optimization, Tez integration and in-memory cache
  • Hive has improved SQL compliance. 
  • Perimeter security added to Hadoop with Knox.  Enterprise Hadoop offers authorization, authentication and encryption. 
  • Data Lifecycle Management and data governance with Falcon.
  • Enhanced HDFS security and multi-tenancy capabilities.
  • Resource Manager High Availability
  • NameNode Federation improving scalability and multi-tenancy and stronger support of different run time characteristics.  
  • Linux and Windows releases synched.
  • HDP search  with Apache Solr increases the capabilities of Hadoop.
  • Storm providing scalability streaming to Hadoop.
  • Spark is available under Tech Preview to provide real time in-memory processing.
Ambari:
  • Splitting the management interface Ambari with the HDP distribution. The management tool and the Hadoop software distribution can be rev'd separately.
  • Support of software stacks Storm, Tez and Falcon.
  • Maintenance mode silences alerts for services, hosts and components for administration work.
  • Rolling restarts.
  • Service and component restarts.
  • Support of zookeeper configurations.
  • Supports decommissioning of NodeManagers and RegionServers.
  • Ability to refresh client-only configurations

No comments:

Post a Comment