AI projects need data. It doesn’t have to be perfect, but it needs to have enough quality and consistency for useful patterns to emerge. However, many companies are overwhelmed by the volume, velocity and variety of their data and find themselves unable to access data’s fourth V: value. So how should we think about data preparation strategies to avoid potential data paralysis or over-ambition with your AI projects?
The better the data, the better the AI. But for many companies, there’s a problem: 85 percent of their data is either dark (whereby its value is unknown), redundant, obsolete or trivial.
It’s not always easy to determine where you will find value, but in order to even understand the landscape, the data needs to be cleaned up and integrated into your business. It’s all about making sure that the data has a structure and format that will enable you to develop it into the training data you need for your AI models.
Restructuring the data is a task that can be both mundane and enormous. Phone numbers, for instance, need to be formatted consistently, with spaces in the same places. Or consider addresses: if one person has given their city as “New York”, another “New-York”, and a third has given “NYC”, AI models will represent these as three separate entities unless trained to associate them.
Leaders know they need to use their data to stay competitive, but they also know the monumental tasks they face in cleaning up that data. It’s time-consuming and expensive, and in many cases, they don’t know the best way to go about it—which can make them feel paralyzed.
Putting data scientists on the task isn’t always the best solution. When they are tapped for this purpose, this valuable talent often gets trapped in “the data dungeon”—spending too much time doing tedious data preparation work and often feeling paralyzed by the volume of data to be cleaned. Too little time, then, is spent uncovering powerful insights that can transform their business, create a new customer experience, and much more—which, ultimately, is their main objective.