Skip to main content Skip to Footer

September 11, 2015
So You Want to be a Data Scientist?
By: Sanghamitra Deb

Believe it or not, the role of “data scientist” has been called the sexiest job of the 21st century.1 As data in every industry grows, the need to organize, explore, analyze, predict and summarize is insatiable. Data Science is the process of filtering the most relevant information from data using scientific techniques. Companies use these insights to drive business decisions, optimize resources and improve customer engagement.

Surprisingly, the role of data scientist did not exist just a few years ago. It evolved when people with quantitative backgrounds and PhDs in sciences started applying science to industrial data. Now there are several incubators offering training in the field of data science, as well as online courses with data science tracks.

As the field develops, it’s become clear that data scientists require a range of skill sets, starting with skills for:

  • Transforming data—The journey from raw data to actionable insights has several stages. It begins with data that is typically a sequence of numbers and characters with no significance. This could be big data from a mobile app that is collecting information from millions of people on the use of emoticons,2 or smaller data sets such as a health insurance company tracking patients who undergo heart surgery in any particular year.3

    The first step is to find a suitable architecture for the data. For big data there are several solutions depending on the end goal and the data type. Some of the most highly sought data architecture skills in the industry are Hadoop, HBase, Hive, Cassandra, MongoDB and SQL. These technologies solve the problem of hosting and transforming data. Recently automated data transformation tools, including an Intelligent Data Munging platform developed at Accenture Technology Labs, automates the data transformation process transparently without requiring any data architecture skills.4

  • Gaining insights and making predictions—This stage requires data scientists to use statistics and machine learning. That’s why it is common for candidates to need a graduate-level degree in statistics, mathematics, computer science or a related field. Data scientists must also understand programing languages such as Python, R, Julia and Scala to execute the insights. These languages are popular because they offer a variety of statistical and machine learning packages, the learning curve is relatively smooth, and codes from some of these languages can be directly used for production.

  • Communicating the results—The final stage is to share the insights and the best mode to do this is through visualizations. Some of the visualization tools used by the data world are Tableau, Qlikview and a range of programing tools such as javascript, d3 and others related to web development. It is important for the data scientist to consider the end user who will be viewing the results (i.e., business lead, consumer, etc.) and design the visualization accordingly.

In addition to these computational skills, several soft skills are required to be successful in the data domain. A critical one is listening to ideas from people who excel in different fields. Typically the science that comes from the data is an amalgamation of results derived from several people in a team. For this reason, another important quality of a data scientist is the ability to build a collaborative environment.

Clearly the skill requirements for data scientists are varied. They describe an architect, a scientist, a designer, a web developer, a programmer, a manager and a public speaker—all in one. If you want to become a data scientist, you have to ask: Does your current job align with a subset of these roles? What other skills are easiest for you to acquire in a finite time? Do you need formal training or are you good at self-learning? Answers to these questions will get you started in your journey in data science.

2America loves the eggplant emoji
3Effect of feature selection on machine learning algorithms for more accurate predictor of surgical outcomes in Benign Pro Static Hyperplasia cases (BPH), Authors: Megherbi, D.B. Dept. of Electr. & Comput. Eng., Univ. of Massachusetts, Lowell, MA, USA Soper, B.
4Accenture Technology Labs Intelligent Data Munging point of view. Paul Mahler, Sanghamitra Deb and Colin Puri

More blogs on this topic