Data lake icon
Data Lake and Data Warehouse

DIA Webinar Recap: The Missed Promise of Hadoop and New and Emerging Technologies

By Melanie Deardorff

Our August webinar, a monthly partnership with DATAVERSITY, covered Hadoop, which entered the scene with great fanfare but now seems to be on its way out. Or is it? Are the pundits just pontificating, or is there something to their end-of-life proclamations?

In our webinar, The Missed Promise of Hadoop and New and Emerging Technologies, Kelle O’Neal and John Ladley dug into the current state of Hadoop to address its relevancy, the case for and against it and alternative technology on the upswing.

High-level topics in this webinar included:

  • Hadoop’s evolution
  • Current state of Hadoop: pros and cons for big data and analytics
  • Role of Hadoop in enterprise architecture
  • Successful use cases and lessons learned
  • Best practices and key takeaways

Here’s a brief recap of the August webinar. The full recording, including our presentation material, can be found in DATAVERSITY’s webinars archive.

Analyst Viewpoints on Hadoop

  • It’s well-entrenched.
  • Spark, while a replacement, is still entering the “trough of disillusionment.”
  • Other large volume options (e.g., in-memory database management systems) are mature.
  • Hadoop is part of the landscape and other tools take over its limitations.
  • Artificial intelligence may eventually overtake Hadoop.

Hadoop Use Cases (where it’s a main player and focal point)

Use case #1 – large organization

  • Very large organization with lots of data and skill technologists
  • Owned data centers and offered as a service
  • Located where there is a subpar internet backbone (e.g., Hawaii)

Use case #2 – direct control

  • Where economics are less important than direct control
  • The safest connection is no connection (e.g., research facilities, atomic/nuclear, military)

Alternatives to Hadoop (not an exhaustive list)

  • New Big Data File Handling
    • Apache Spark
    • Apache Storm
    • Google BigQuery
    • DataTorrent RTS
    • Hydra
    • Amazon S3
  • Huge Queries
    • Snowflake
    • Amazon Redshift
  • Address Hadoop Limitations
    • Podium
    • Hortonworks
    • Cloudera
  • Other NoSQL
    • Cassandra
    • MongoDB
    • Apache HBase

Best Practices and Key Takeaways

  • Stick with Hadoop and a product that addresses shortcomings for data at rest.
  • Design a data architecture, whether or not you are using a cloud.
  • Consider if your organization has unique characteristics that may place Hadoop center stage.
  • If streaming data, then you need to get serious about other types of tech, or just place it in the cloud.
  • There is no avoiding data governance and data management, regardless of your approach.
  • Hadoop will remain a part of data supply chains and analytics ecosystems for data at rest and lakes.
  • Hadoop is technology — which always commoditizes or disappears (it is not a decision of if, but how)

UP NEXT: Advanced Databases and Knowledge Management Webinar

Our Data Insights & Analytics webinar series continues next month with Advanced Databases and Knowledge Management on Thursday, September 6. Kelle and John will explore leading database management system technologies, including unique applications for Graph databases; other promising, new database technologies; and examples of big data, analytics and Graphs at work.

If you’d like to keep in touch with FSFP and future webinars and industry events where we will be presenting, please share your contact information below.

Stay in touch!