Data Lakes for Business: Big Data 2018 End-User Research Results
Abstract: Big data is a way to look at new sources of data--structured, multi-structured, and unstructured--and how organizations place these sources of information under "new management." The EMA Big Data surveys use deliberately broad definitions of big data and data lakes to inspire end users to think beyond limiting definitions. For big data, users should look beyond just unstructured data, and for data lakes, they should look beyond "just" Hadoop. This allows organizations to face the challenges that their legacy data management practices and platforms could not. They can evaluate how opportunities opened companies to a larger world with IoT sensor and mobile app data, augmented with customer and product information from enterprise applications. Additionally, companies can examine how traditional analytical architectures, such as the data warehouse, work with new modern data science implementations and how logical architectures that include relational databases and NoSQL platforms, such as Hadoop, MongoDB, and Cassandra, can be interwoven. As was established in four previous studies since 2012, the broad concept of the data lake offers a range of possibilities and use cases. Beyond the initial stages of a simple exploration repository, combined data lakes (such as the EMA Hybrid Data Ecosystem) have grown to support and include traditional operational and analytical environments. In this edition of the EMA Big Data survey, a deep dive on the data lake architecture provides interesting insight. As a leading data management architecture associated with Hadoop environments, and potential coopetition for traditional enterprise data warehouses, the data lake provides both great opportunity and potential risk for organizations implementing this modern data architecture. |
Author:
|