Machine Learning

Machine Learning Is the New Hadoop

Although I took statistics classes in college, my first real introduction to practical advanced analytics was in 2008 when Netezza purchased NuTech Solutions to extend the functionality of its data warehousing platform.   

I understood basic statistics, but had no significant experience with practical business applications.

Fast-forward 9 years.  Advanced analytics, machine learning, and artificial intelligence are inescapable.

  • The Alexa device sitting on our kitchen countertop knows what we buy, and how often we buy it
  • The search engine I used this morning to find more detail about an article I read is powered by machine learning
  • Driving around the Bay Area last week, I passed a Google self-driving vehicle (they’re slow…)

Machine learning is so prevalent that we run into (or pass) it every day, and the most common need I hear from CDOs I talk to is to figure out how to implement and get benefit from machine learning.

This is a great question.  Machine learning is here to stay.  

But be careful—we’ve been here before.

The 1990s—Java

There was a lot of excitement around the first release of Java in 1995.

I was a software engineer at AOL in the late 90s, and everyone was talking about the potential of Java.  The power of a cross-platform programming language and the possibility of “write once, run everywhere” was a very powerful draw.

Many people saw Java as the future of computer programming, and thought it would quickly become the most popular language for most applications.  Games, applications, utilities—they’ll all be written in Java, and run everywhere!

It didn’t quite happen like that.

Where it is now

Java is still popular for many applications, but the limitations of the platform are much more evident.  

Applications with extreme graphic or performance requirements or anything that needs to take advantage of specific platform functionality usually isn’t written in Java.

Unfortunately, the path to that understanding is littered with dead visionary companies whose vision wasn’t so great.

Every new technology goes through a period of finding its best fit—do you want to be the one that learns the lessons others benefit from?

The 2000s—Hadoop

Based on a Google File System paper from 2003, the Apache Hadoop project was created in early 2006 to a lot of initial excitement.

I was at Disney at the time, and like a lot of companies we were excited by the possibilities of this new approach and interested in finding the best fit.

I spent much of the rest of the decade at Netezza, talking to many companies about how they managed data and their plans for the future.

Hadoop was everywhere.

But in some ways, it was nowhere.  I had a lot of Hadoop conversations, but few people had realized real business value from it despite trying it in applications as diverse as data warehousing, sentiment analysis, and segmentation calculation.  

It was in labs, but not in production.

Hadoop had value, but that value wasn’t well-defined and people were afraid of being left behind.  Did it replace a data warehouse?  An ODS?  Was it a calculation platform or a storage solution?  Nobody really knew where it fit.

Where it is now

Today, companies everywhere are building data lakes and finding value keeping historical data available for unforeseen future uses.  

We’ve found the right fit for the technology, as a highly-latent distributed processing engine, and as a solution for longer-term online storage and analysis—a data lake.

Unlike Java, Hadoop is still riding down the back side of the hype curve.  We’ll find new uses in the future, and some current uses will be discarded.  But there’s a well-defined sweet spot, and if Hadoop isn’t on your roadmap you should start thinking about adding it.

The 2010s—Machine Learning

Which brings us to machine learning.  

There’s a lot of excitement, everyone is watching it, many want to try it out, but most aren’t sure how to take advantage of it.

Sound familiar?

Sure, there are many practical applications.  ThoughtSpot uses machine learning to provide self-service relational search, but it really took off in the consumer space.

Google, Facebook, LinkedIn, Yelp, Amazon, and many others.  They’re all certainly mainstream.  But maybe not a good blueprint for how you can get value.

And that’s part of the problem.  It’s still early, and we’re still defining use cases.  

It’s a good time to jump in the waters and take advantage of some of those defined use cases, but we have to be careful and avoid drowning.  There’s a lot of experimentation going on, and most of us would rather implement lessons others have struggled to learn.

Stages of technology adoption

There’s a hype curve for every technology, whether it’s Java in the 90s, Hadoop in the 00s, or Machine Learning in the 10s.  Each goes through four distinct stages.  It’s our job to figure out which stage the technology is in, and where we’re comfortable playing.

Stage 1 - Introduction

This first stage is defined by questions.  What is it?  Who’s using it?  Should I pay attention?  

There’s often a lot of buzz about technology at this stage, but few referenceable use cases.  

Unless you’re a startup, this is a good time for education and patience.  The technology should be on your radar, but it’s generally to early to invest.

Stage 2 - Definition

The definition stage is defined both by how a technology can be used successfully as well as by how it can’t.  

Some interesting use cases with ROI start appearing, but they’re few and far between.  

This stage is often characterized by hype and talk about “replacement.”  It’s human nature to define new function in terms of existing function, and the most obvious way to do that is to identify what technology it will replace.  

An agile organization should try to start seriously implementing technologies in the latter half of this stage.  By the time a technology enters large-scale adoption, many of your competitors are already getting benefit from it.

Although Machine Learning is a broad topic, as a general technology it’s at the late definition stage.  We’re still finding new uses and there are limitations we haven’t hit yet, but there’s also clearly-defined value for the right use cases.

Stage 3 - Adoption

Once we start to see repeatable use cases and understand the limitations of a technology, organizations will begin adopting it at scale.

There are two simple indicators that a technology is in the adoption stage:
The limitations are well understood—we know what it’s not good at
There are repeatable use cases with well-defined ROI

Hadoop is at this stage.  New use cases will be identified—especially if you include technologies in the larger Hadoop ecosystem—but it’s well understood how to get value and where it falls down.

Stage 4 - Maturity

A technology enters the last stage, maturity, when it starts to become commoditized and its differentiation is seen more as a characteristic than a strong differentiator.  

Java is a mature technology. Although it’s still evolving, it’s well understood how it can best be used and what was once seen as its core advantages are now seen more as general benefits than earth-shattering differentiators.

So what does this mean?

Before you invest in a technology, understand where it is on the technology adoption curve as well as your organization’s appetite for experimentation and risk.

If you’re a small startup that highly values differentiation and can move quickly, testing technologies at the introduction stage can be a good idea.

If you’re a Fortune 100, on the other hand, the late definition stage might be the right place for you.  

The trick is to not wait too long.  In the rapidly increasing pace of today’s economy, the risk of adopting too early is nowhere near as high as the risk of being left behind.

What’s your stage?

At what stage do you start looking at new technologies?  Let us know in the comments.

×