Innovation Around Big Data is Creating Choice
The Modern Enterprise Big Data Platform has been referred to by many names. Names like Modern Data Lake, Enterprise Data Hub, Marshal Data Yard and Virtual Data Lake to name a few. Each name is associated with a defining characteristic, philosophy or goal. Big data platforms are evolving at an amazing speed due in large part to the interest around big data as well as the innovation of open source. This innovation is creating a lot of choice as well as a lot of confusion. The decisions are not easy around the choice of distributions, frameworks, reference architectures, NoSQL databases, real-time access, data governance, etc.
A Blended Solution around Data?
Hadoop and NoSQL are adding functionality that currently exists in RDBMS and EDW platforms. RDBMS and EDW platforms are adding feature and functionality that exists in Hadoop and NoSQL as well as adding connectors that support data integration with big data platforms. Map Reduce applications or R scripts can run in some relational databases. It’s now possible to execute a join where some of the data resides in a RDBMS/EDW and other data resides in Hadoop or NoSQL. Where should the data reside? Who should own the SQL statement. The Modern Enterprise Data Platform is not a static platform. It is instead a platform that is taking on new forms and functionality. Organization needs to look on how to design a flexible enterprise environment that can leverage the features and functionality of all data platforms and meet the current/future needs of the organization.
The Modern Enterprise Big Data Platform has been referred to by many names. Names like Modern Data Lake, Enterprise Data Hub, Marshal Data Yard and Virtual Data Lake to name a few. Each name is associated with a defining characteristic, philosophy or goal. Big data platforms are evolving at an amazing speed due in large part to the interest around big data as well as the innovation of open source. This innovation is creating a lot of choice as well as a lot of confusion. The decisions are not easy around the choice of distributions, frameworks, reference architectures, NoSQL databases, real-time access, data governance, etc.
A Blended Solution around Data?
Hadoop and NoSQL are adding functionality that currently exists in RDBMS and EDW platforms. RDBMS and EDW platforms are adding feature and functionality that exists in Hadoop and NoSQL as well as adding connectors that support data integration with big data platforms. Map Reduce applications or R scripts can run in some relational databases. It’s now possible to execute a join where some of the data resides in a RDBMS/EDW and other data resides in Hadoop or NoSQL. Where should the data reside? Who should own the SQL statement. The Modern Enterprise Data Platform is not a static platform. It is instead a platform that is taking on new forms and functionality. Organization needs to look on how to design a flexible enterprise environment that can leverage the features and functionality of all data platforms and meet the current/future needs of the organization.
Data Needs to be Consumable and Actionable
The problems to be solved are not just around Hadoop, NoSQL, NewSQL, RDBMS, EDWs or even about the data. The goal is to improve decision making and business insight. Organizations need to be able to make business decisions faster, improving the accuracy and reducing the risk of business decisions. To be able to handle the data volume, velocity and variety for data cost-effectively and efficiently. The management and governance of data needs to take into consideration the evolution of these data platforms and how to ensure the data is consumable and actionable.
Increasing business insight by improving analytics is one of the goals of big data. One step in achieving this goal is by reducing the amount of data silos. It’s also important to make sure we do not rebuild the data silos in big data platforms. Be aware, the core designs for a RDBMS, EDW, Hadoop cluster and NoSQL database platforms were created for different reasons. A Hadoop cluster is not ACID compliant, a NoSQL database is not relational and an RDBMS cannot scale at cost the way a Hadoop cluster can. One needs to look at the key business goals and use cases to leverage the features of all the data platforms to achieve the strategic goals around data.
No comments:
Post a Comment