Skip to main content Skip to Footer

September 10, 2015
Data and the Digital Customer
By: Hans Li and Bryan Walker

There’s a large opportunity to extract more understanding of today’s Digital Customers when enterprises can effectively combine and manage data from multiple sources. For example, today’s retailers effectively use their loyalty data to market to their customers, and their transaction data to help drive sales and service strategies; it’s sometimes difficult to combine loyalty data with transaction data because they are in completely different systems. And when we think beyond the enterprises four walls, we find even more data about customers including likes, pins, tweets, locations, etc. It would be a significant competitive advantage for companies to start utilizing all of these data sources to derive insights on customer behaviors and preferences.

As part of our Digital Customer Initiative research, we have investigated solutions that can help resolve data integration challenges. To this end, we set out to determine how the tried-and-true technique of Master Data Management (MDM) could apply to social media data.

Master data is defined as all the data that is critical for business operations. This data is typically stored in disparate systems across an enterprise environment. Master Data Management tools synchronize master data related to customers, products, assets and employees by removing duplicate entries, standardizing all incoming data and eliminating erroneous data.

Figure 1

Figure 1: Master Data Management hub

Architecture wise, the primary component of a Master Data Management tool is the MDM hub as seen in Figure 1. The MDM hub can typically be accessed through a web services interface. The master data in the MDM hub is synchronized and updated through the synchronization algorithm of each MDM tool. There are three different approaches to implement a MDM solution and each has their own pros and cons: you can read about them here.

Drawbacks of applying MDM to social media data
With the implementation architectures in mind, let’s talk about the data going into the database. The data population process of the MDM hub is similar to that of the traditional data warehouse. Many MDM implementations include standard extract, transform and load (ETL) tools. A typical data load consists of extracting data from the source system; transforming the extracted data to the hub data model; checking the data for duplicates; and loading the data into the MDM hub’s database.

This MDM process works fairly well for data integration from disparate sources. However, there are a few caveats when using a MDM solution for social media data. The first issue comes from the duplication checking in the MDM hub, which typically provides a duplication-checking algorithm to specify the degree of confidence desired for the specific data set being imported.

For example, if a company specified it wanted to import a set of customer data with 95 percent confidence, the MDM hub would complete the matching process and return the entries that have less than 95 percent confidence in the matching process for manual processing. This poses a major issue. With the amount of social media available, it is nearly impossible to manually process everything.

Furthermore, the matching processes in the majority of MDM hubs are only name or address matching by wording or capitalization. This isn’t particularly helpful when social media can have many fields available for matching. How do we find out the Michael X that just bought bacon is the same Michael X on the social media platform? What happens if the Michael Smith we found claims that he is a vegan in his social media profile? These are the issues that don’t fall within the reach of typical MDM solutions.

The third issue is that MDM hubs are not optimized for real-time data streams. Some social media information is time sensitive and needs to be processed immediately for marketing campaigns.

Finally, it is somewhat difficult to exchange existing components of the MDM hub with proprietary components. This means choosing a MDM solution for the Customer Genome project leaves us little room for custom development or additions.

Building a better framework for data processing
Master Data Management is a smart solution for many enterprise data quality issues when it comes to established markets, such as financial services and insurance, where accurate name and address information is critical to operations. But it is not the ideal choice when it comes to social media data integration.

To overcome this, we are considering other options such as building our own framework to ingest, process and store the data with highest versatility. Our framework would be able to handle the variety of internal and external data sources that a retailer would commonly access, and also be able to deal with streaming data and real-time processing.

In our next blog post, we will talk about our progress in this framework development. Read this paper if you wish to learn more about the Digital Customer project. If you would like to learn more about data integration or digital customer, please contact Hans Li (

More blogs on this topic