Why “Data is the New Oil” is a Flawed Strategy
Fountainhead News: May 16, 2017
Last week I had lunch with an Innovation chap now working for a major global company that’s been around for a long, long time. One of the things he asked me was about the “data is the new oil” line that many in tech are talking about in reference to artificial intelligence.
He was sick of it and something didn’t quite sit right with him. He asked me whether there’s a better way to say it or what’s flawed about it.
In short, if you’re saving all of your company’s data in hopes that one day you’ll be able to run it through artificial intelligence software and extract insights from it, then you’re doing it wrong.
This insight came about while we were building our self-learning AI product.
You may shake your head and ask how this could be. Or give us an example like surely medical health records should be saved. You think that way because you’re coming at the problem with a legacy technology in mind. Let me explain.
Every day your business generates and collects massive amounts of data. Whether about your employees, internal operations, suppliers, customers or whatever. You take all of that data and store it somewhere, like a database. And that database, today, is likely hosted on something like AWS. Which means you’re paying for a data lake or data warehouse at about $0.01 to $0.02 per GB depending on how easily accessible it is.
So, the first problem is you’re literally paying money to store data that you’re not doing anything with.
This is where you say, “But we will do something with it some day Sean and everyone says data is valuable. Data is the new oil, Sean!”.
Here comes the next gotcha. Every day, there is more data generated than the day before. There’s some stat always thrown around about how people have created more digital data in the last 3 years than in the previous 2,000 or something. But the crux of the stat is true. Data is increasing. And truth be told, it’s increasing exponentially day over day, week over week, month over month, and year over year.
Which means the money you’re paying to store this data is, you guessed it, also increasing exponentially ad infinum.
Again, you’re going to tell me that this data, at the end of the day, is your only competitive advantage. That may be true, but if you’re not using it and its costing you money, then it’s more like a drain on resources, no? Sort of like employing a bunch of people who don’t come to work and don’t do a darn thing every day. But you still pay them because one day these people are going to do something for us!
It sounds silly when you think of data as employees who don’t do anything but still get paid.
How do we get around these problems? How do we still capture value from this ever-increasing amount of data while no incurring the massive charge? The answer is more elegant than you could imagine.
You internalize each new bit of data in real-time and then throw it away after it passes through your system.
But how? Current technology doesn’t allow that type of real-time, self-learning in a cost-effective manner, right? What if I told you that you could get a self-learning AI package in 1,000 lines of code that you feed all data into as it is collected, thereby bypassing the costly data storage systems.
Lets use an example.
For the last decade I’ve subscribed to about 100 blogs, parse through about 1,000 posts per day to stay up to date on technology trends. It is the single most important aspect of my continuing education. But remember instapaper and read it later services?
They would have you come across a new article or blog post and, instead of reading it right now, save it for later. What you end up finding is that as you get into that habit, you never go back to the old articles because more are being published every single day. Even if you save the most important articles, you never go back to them because, again, there are more published every single day.
These articles are exactly like your company’s data. You will never go back to it because ever more data is coming in faster, larger, and heavier every day. In short, you can’t keep up with the sheer volume. So storing data is a fool’s errand. If you don’t “read” it and understand it now, as it’s happening, then you’re never going to go back to it.
At this point you’re going to say to us, “But I still need to train an AI model on this data and will still need it”. And we respond with, “That’s because you’re using obsolete, legacy technology.”
Self-learning AI operates in real-time. Obsolete AI takes days, weeks, or months after the fact to collect, clean, organize, annotate data and then build models on it that require constant tweaking and re-learning. In essence, the obsolesence is all about efficiency of doing something in the moment versus after the fact. Legacy systems do it after the fact, not as-it-happens.
So the next time you hear someone say, “Data is the new oil” it’s not that they’re wrong. It’s just that they have this oil still sitting in the ground, instead of extracting it and selling it for a profit.
If your data is still collecting dust underground in some data center somewhere, then it’s not oil any longer. It’s just dirt. And dirt isn’t valuable.
Transform it back into oil. It’s what the dinosaurs would have wanted.
from Stories by Sean Everett on Medium http://ift.tt/2pRi3dv