## BLOG

January 15, 2016
The Difference between Data Science and Statistics
By: Billy Hou

I recently ran across a few interesting posts on LinkedIn and Twitter on the topic of data science and statistics, such as the ones below. While I found these posts funny and amusing, they reveal the general lack of understanding of the difference between data science and statistics.

The Oxford Dictionary of Statistical Terms defines statistics as “the science of the collection, analysis, interpretation, presentation, and organization of data.” Given that the words data and science appear in the definition, one might assume that data science is just a rebranding of statistics.

Now let’s look at the definition of data science provided by Data Science Association “Data science means the scientific study of the creation, validation and transformation of data to create meaning.”

Based on these definitions, it is not difficult to see why people cannot identify the difference between the two. In my opinion, data science is an extension and continuation of statistics and several other fields. I would define data science as an interdisciplinary study to extract actionable meaning from data using statistics, computer science, and general business skills.

So what do statisticians do? they use statistical methods to collect and analyze data and to help solve real-world problems.

What about data scientists? they are responsible for modeling complex problems, discovering insights and identifying opportunities through the use of statistical, mathematical, computer programming and visualization techniques.

In addition to advanced analytic skills, a data scientist is also required to integrate and prepare large, varied datasets, architect specialized database and computing environments, and communicate results.

Now, you can see why I think data science is an extension and continuation of statistics. Data scientists are not only required to be proficient at statistics, they need to master other mathematical techniques such as simulation and optimization, computer skills such as programing and data manipulation, visualization techniques and have general business knowledge of the problem at hand.

In our practice, the business problems are around operations and supply chain including network design, end to end inventory optimization, segmentation, supply chain risk and price optimization. These require deep domain knowledge in themselves from which we can effectively apply our data science skills.

Statistics is a part of data science, but not the whole thing. Data scientists use statistics as part of their toolkit to solve the problems they face. It is a new and exciting interdisciplinary field. Attempting to claim that data science is merely statisics belittles the many other contributions of this expertise. Data science is beyond statistics and is here to stay.

