If you’re interested in getting First San Francisco Partners’ take on the “best” solution for your organization, we hope you attended our February 2 Data Insights & Analytics (DIA) webinar. Our co-hosts Kelle O’Neal and John Ladley covered the Data Lake vs. Data Warehouse topic from all angles.
Our agenda for the one-hour webinar was packed full:
- Defining the Data Lake and Data Warehouse
- Key differences between the Data Lake and Data Warehouse
- How to optimize the Data Lake
- How to optimize the Data Warehouse
- Sample Data Lake and Data Warehouse architectures and use cases
- How a Data Lake can solve the problems of a Data Warehouse
- Key findings and takeaways
While Data Lakes and Data Warehouses are not new concepts, explaining them to someone who’s unfamiliar with the terms can benefit from a straight-forward approach, like the Gartner definitions Kelle and John shared:
A Data Warehouse is a storage architecture designed to hold data extracted from transaction systems, operational data stores and external sources. The warehouse then combines that data in an aggregate, summary form suitable for enterprise-wide data analysis and reporting for predefined business needs.*
A Data Lake is a collection of storage instances of various data assets additional to the originating data sources. These assets are stored in a near-exact, or even exact, copy of the source format. The purpose of a Data Lake is to present an unrefined view of data to only the most highly skilled analysts, to help them explore their data refinement and analysis techniques independent of any of the system-of-record compromises that may exist in a traditional analytic data store.*
We also appreciate a great analogy, like this one from Pentaho CEO James Dixon who coined the term Data Lake more than five years ago (you can think of a Data Mart as a subset of a Data Warehouse):
Think of a Data Mart as a store of bottled water — it’s cleansed, packaged and structured for easy consumption. The Data Lake, meanwhile, is a large body of water in a more natural state. The contents of the Data Lake stream in from a source to fill the lake, and various users of the lake can come to examine, dive in or take samples.
Kelle and John highlighted several use cases which emphasized the importance of aligning business strategy with whatever solution you choose — a Data Lake, Data Warehouse or a “best fit,” like a blended model.
You can find the February 2 webinar replay and presentation material on demand at DATAVERSITY. And … it’s never too early to reserve your (virtual) seat for our next call on March 2. Our topic, Descriptive, Prescriptive and Predictive Analytics, promises to be a great one for those of us who care about integrated and effective business intelligence and analytics.