In our recent white papers on Retail Analytics, we discussed how advanced analytics is being applied successfully in the retail industry. We received a lot of feedback and questions from our readers, especially around dynamic pricing. The most common theme in these questions was what is the biggest challenges that they would face during a dynamic pricing implementation?
Our experience working on retail analytics projects tells us that data preparation is the longest step in almost all projects, and also one of the biggest challenges for a pricing project. Some of you would argue that building or testing models is the hardest step. But more often than not, it is the data preparation phase that takes the longest time, is resource intensive and forms the foundation of the type of meaningful algorithms that can be built.
For this discussion, we will focus on dynamic pricing which is used when prices of products sold vary frequently in response to factors such as incoming traffic, competition and other market forces. Algorithms often set the price dynamically with an objective of maximizing revenue or profit. Retailers that rely solely on judgment and manual processes will have a hard time keeping up with the pace of change in the Internet’s price-transparent markets.
Why so much emphasis on data?
Pricing algorithms, for the most part, are learning algorithms. They need to be trained and tested on historical data to be able to predict or suggest prices. As a general rule of thumb, a good pricing model is trained on good quality data. It is very hard to build a good pricing algorithm with inaccurate or missing data, although there are some smart techniques to account for missing data too!
What constitutes the data preparation phase?
We like to split the data preparation step into three stages: gathering, evaluation and cleansing. Data gathering is where we define and collect the kind of data that is needed for evaluation and primary model building. These are usually outputs of multiple workshops we conduct with various functions and stakeholders of the project. Once gathered, it is very important to evaluate the data for validity, but also to finalize what kind of processing and storage infrastructure will be needed for the project. The last stage of data preparation is where we cleanse the data to weed out missing data and spot outliers. This stage seems simple in practice, however, it is often the longest step in the project.
What type of data is needed?
The type of data required to build a pricing model can be classified into internal and external data. In retail, internal data includes target margin, historical transactions, discounts, number of page visits, stock level, etc. External data consists of competitor prices, events and the like.
Both quality and quantity are important aspects for internal as well as external data. It is easier to gather good quality data in large quantities for internal data than for external data. Retailers generally do a good job of maintaining a database of historical transactions. While some companies have decades’ worth of data, for all practical purposes two to three years of data suffices. Retail is too dynamic to draw meaningful information from data that is over 5 years old.
Both quality and quantity are often issues in gathering external data. External data is not easily available for most companies. If a company has not been collecting data on their competition or other market conditions, it is very difficult to access historical data on competitor prices.
Data infrastructure also plays an important role in a pricing system implementation. There are two areas that need attention: Data Storage and Data Processing.
Data Storage: Depending on the type and size of the data an appropriate database server should be in place. For small applications, databases such ase MS Access™ or SQLlite™ suffice. For slightly larger applications MySQL® and Postgre® do the job well. For very large applications, the Apache Hadoop™ platform provides a convenient way to store data on a commodity hardware cluster. Data warehousing systems like Hive™® (SQL like) or Hbase™ (NoSQL) can be used to store and manage data. The scalability of these systems is virtually unlimited. Facebook®, for example, stores hundreds of petabytes of data on a Hadoop cluster.
Data Processing: Pricing systems rely on machine learning algorithms that crunch huge datasets. The CPU requirements are often very intensive. While many modern day computer systems are sufficient to run small static pricing models, parallel computing is needed for large datasets and for agile dynamic pricing systems. Moving to parallel computing is luckily not as intimidating as it used to be. Apache Spark™ is a popular cluster computing framework which interfaces with distributed storage systems and has distributed machine learning framework capabilities through MLlib. There are enterprise providers such as Cloudera® that can set up the Spark system.
Issues with data
Problems with data are inevitable in any data collection system. The two major issues are missing data, and outliers. In both these cases, it is best to start with identifying the causes for these issues. If there are issues that you spot during a manual check, it could be the tip of an iceberg. Identifying the real cause of the problem gives us confidence in the data that we are using. If issues cannot be resolved, pricing systems can handle missing values through imputations, and handle outliers with various smoothening techniques.
Spending time and effort at the beginning of the project to identify, gather and store good data will go a long way towards improving the project, and beyond. While every case will be different, we hope we have given you some ideas on how to approach your data preparation phase. We encourage you to read our two whitepapers on retail analytics and dynamic pricing, if you have not done so already. They contain detailed information on other aspects of implementing a successful dynamic pricing project.
Accenture is a leading global professional services company, providing a broad range of services and solutions in strategy, consulting, digital, technology and operations. Combining unmatched experience and specialized skills across more than 40 industries and all business functions—underpinned by the world's largest delivery network—Accenture works at the intersection of business and technology to help clients improve their performance and create sustainable value for their stakeholders. With approximately 373,000 people serving clients in more than 120 countries, Accenture drives innovation to improve the way the world works and lives. Visit us at www.accenture.com.
Accenture Digital, comprised of Accenture Analytics, Accenture Interactive and Accenture Mobility, offers a comprehensive portfolio of business and technology services across digital marketing, mobility and analytics. From developing digital strategies to implementing digital technologies and running digital processes on their behalf, Accenture Digital helps clients leverage connected and mobile devices; extract insights from data using analytics; and enrich end-customer experiences and interactions, delivering tangible results from the virtual world and driving growth. Learn more about Accenture Digital at www.accenture.com/digital.
About Accenture Analytics
Accenture Analytics, part of Accenture Digital, delivers insight-driven outcomes at scale to help organizations improve their performance. With deep industry, functional, business process and technical experience, Accenture Analytics develops innovative consulting and outsourcing services for clients to help ensure they receive returns on their analytics investments. For more information follow us @ISpeakAnalytics and visit www.accenture.com/analytics.
This document makes descriptive reference to trademarks that may be owned by others. The use of such trademarks herein is not an assertion of ownership of such trademarks by Accenture and is not intended to represent or imply the existence of an association between Accenture and the lawful owners of such trademarks.
This blogpost is produced by consultants at Accenture as general guidance. It is not intended to provide specific advice on your circumstances. If you require advice or further details on any matters referred to, please contact your Accenture representative.