When I look back years ago, there was always a strange separation between software engineers and data guys. They lived in two different worlds. On one side, there were open-source, mobile, front-end Microservice-architectures that were fancy and containerized. On the other side, we had traditional data warehouse monoliths, based on antic architecture concepts and unbelievably expensive, packaged software product platforms. Even on architecture diagrams, the analytics-module was always somehow disconnected to the rest.
Source: Accenture Research
Classic data business works like recycling and uses data only as a passive source of information. There exists a garbage heap of old information, of which 90% only held relevance for a matter of seconds. It still contains leftovers of valuable substances, but you would need to dig through the mud to find any valuable information. This is usually hard work, as you would need to collect huge dumps of data from multiple systems and build some nightly batch Extract Transform Load (ETL) data pipelines on top, in order to cleanse the data and recreate a kind of business context. Since this job is extremely expensive and time-consuming, we at Accenture had assigned troops of functional data architects, to try and standardize and normalize the data model for any kind of future requirement, which made it even more expensive.
In the past, companies tried to industrialize data integration efforts by using graphical ETL-tools. These tools did not require any kind of software engineering skills. That led to a drain of tech skills within the whole data business. Often, data projects are heavily reliant on semi-manual Excel-driven processes. An average software engineer on a data project is often the one-eyed man who is a king among the blind.
On top of our Mart or data warehouse, there are the data consumers who are often equipped with excellent SQL/Business Intelligence (BI) know-how trying to find the needle in the haystack. With the rise of AI, new powerful tools for filtering out additional insights came up.
But new technologies brought new types of roles into play.
The rise of full-stack data engineers
The first change in the market happened with the rise of big data Hadoop technology a few years ago. The demand for processing petabytes of structured and unstructured data required new design patterns and technology concepts. Hadoop brought up countless new open-source frameworks and non-relational datastores, which floundered fresh software engineers into the data domain. In fact, you need a broad set of tech skills and coding experience to successfully run a Hadoop platform.
The second wave came with the explosion of digital hype. Today, almost every company has started or implemented an agile transformation in order to compete with the market. A modern, agile data team, consists of a few full-stack engineers who should be able to handle all coding, testing, and infrastructure. New data architectures must be capable of frequent go-lives in small iterations. DevOps automation technologies push into the data domain. These technologies require new data architectures based on a cloud-native design. The huge data companies, like Oracle, IBM, Teradata, or Informatica, are far behind the new trend and massively lose market share to smaller players and open source. In most cases, these frameworks and tools require deep software engineering skills.
The latest technology trend which boosted the demand for full-stack data engineers was fast data. The ability to immediately react to data events can be a major differentiator in the market. Use cases like real-time analytics, push notifications and AI-driven recommendation engines close the gap between the data world and custom engineering. Event-driven architectures require solid traditional data-skills and software engineering know-how.
Source: Accenture Research
When I talk to young graduates, they often strive to become data scientists who focus on some Python-based AI and analytics. They completely overestimate the role. The major work is data integration. On the other hand, I value that the universities encourage their students to learn some Python which is one of the major drivers of the new data business.
In the past, data was dominated by functional analysts and platform engineers with a broad ranch of specializations (business analyst, DB admin, ETL-engineer, DB designer, tester, data scientist, etc.). The new data business requires full-stack data engineers with a T-shape profile. A broad software engineering profile should be combined with some data-specific skills. If you are currently enrolled in university and are thinking about a career in data, try to learn the fundamentals. Some traditional object-oriented software engineering, in addition to relational DB modeling and SQL, would be a good foundation.