Overcoming data paralysis when preparing for AI
February 4, 2019
February 4, 2019
AI projects need data. It doesn’t have to be perfect, but it needs to have enough quality and consistency for useful patterns to emerge. However, many companies are overwhelmed by the volume, velocity and variety of their data and find themselves unable to access data’s fourth V: value. So how should we think about data preparation strategies to avoid potential data paralysis or over-ambition with your AI projects?
The better the data, the better the AI. But for many companies, there’s a problem: 85 percent of their data is either dark (whereby its value is unknown), redundant, obsolete or trivial.
It’s not always easy to determine where you will find value, but in order to even understand the landscape, the data needs to be cleaned up and integrated into your business. It’s all about making sure that the data has a structure and format that will enable you to develop it into the training data you need for your AI models.
Restructuring the data is a task that can be both mundane and enormous. Phone numbers, for instance, need to be formatted consistently, with spaces in the same places. Or consider addresses: if one person has given their city as “New York”, another “New-York”, and a third has given “NYC”, AI models will represent these as three separate entities unless trained to associate them.
Leaders know they need to use their data to stay competitive, but they also know the monumental tasks they face in cleaning up that data. It’s time-consuming and expensive, and in many cases, they don’t know the best way to go about it—which can make them feel paralyzed.
Putting data scientists on the task isn’t always the best solution. When they are tapped for this purpose, this valuable talent often gets trapped in “the data dungeon”—spending too much time doing tedious data preparation work and often feeling paralyzed by the volume of data to be cleaned. Too little time, then, is spent uncovering powerful insights that can transform their business, create a new customer experience, and much more—which, ultimately, is their main objective.
Putting data scientists on the task isn’t always the best solution. When they are tapped for this purpose, this valuable talent often gets trapped in “the data dungeon.”
To make the most impact from data and pursue valuable data-driven transformation, companies will want to avoid this data paralysis and uncover ways to move the AI agenda forward. However, companies also need to consider the risk of over-ambitious strategies, which can be just as damaging as data paralysis, as we see both in the example below:
Companies also need to consider the risk of over-ambitious strategies, which can be just as damaging as data paralysis.
My advice is to assess what needs to be done with your data across business functions, but then isolate a small area that is a priority for the organization. This is where you can make a focused, valuable start and gain some momentum, with a view to gradually expanding or replicating the approach over time.
A business might start by choosing, for example, 10 pain points where it needs to improve, then ranking them and finding that by focusing on just the top two, it can achieve a substantial improvement on a key metric. Find an example like that in your business and zero in on it before moving to the next pain points. In this way, you secure tangible AI successes, win the confidence of stakeholders and establish methods you can replicate and scale up.