Cloud and big data: What you need to know in 2021 and beyond
May 04, 2021
May 04, 2021
This year’s Super Bowl LV weekend, I watched a performance that arguably was the greatest of all time (GOAT).
The Weeknd was great, and Tom Brady won another Most Valuable Player award. But I’m talking about this video of Stanford’s Professor Chris Re (a MacArthur fellow, which means he’s an academic GOAT). He describes a new software 2.0 paradigm for artificial intelligence (AI) model development.
What does this have to do with cloud computing and big data? Everything.
Professor Re talks about a future where the best insights will be achieved by excelling at data engineering and domain understanding.
Thanks to transformers and other emerging technologies like edge computing, how we look at cloud and big data today is different than five years ago.
How is that possible? In part because of emerging deep learning models called transformers, which can be pre-trained on a massive data set and then fine-tuned for a particular use case. So instead of creating a new model to solve a problem, we can fine-tune a pre-trained one.
While researchers and highly skilled data scientists will still create AI models, most of us will use models like open-source software. Also, more and more automation will select the model type and tune the hyperparameters.
What’s left for us to do? Most companies will need dramatically more domain experts and data engineers who understand the problem to:
We’ll differentiate less on the models and more on preparing data, applying it to the models and monitoring the accuracy. To lead in AI means leading in data.
This brings me to the topic of cloud and big data. Thanks to transformers and other emerging technologies like edge computing, how we look at cloud and big data today is different than five years ago.
The simple answer first: Big data is a massive data set created by digital technology systems. It’s so voluminous that it can’t be processed by traditional IT (e.g., by getting a bigger and better server).
Here’s the slightly more nuanced answer: Big data is what happened when AI’s demand for data exceeded what traditional IT could supply. We always had business intelligence and analytics on data, but not enough demand to capture data in its original form. AI created a demand to mine for complex patterns, deeper insights and real-time streaming. And so we needed a new set of technologies.
Big data is a competitive advantage of cloud-first businesses like Google, Uber and Netflix. These cloud natives compete on their abilities to drive actionable insights from their big data foundations.
But your company doesn’t have to be a cloud-native to take advantage of big data. Thanks to the wide availability of cloud solutions, you can compete toe-to-toe with any digital upstart.
We can scale big data because of cloud computing—both in how it works and how anyone can access it. So I can’t really talk about big data without cloud. Here are five main ways cloud supports big data:
To put all these advantages in perspective, here’s an example from my work.
A group of midsized banks wanted to share costs and improve the detection of suspicious activity. Our solution was a collaborative anti-money laundering application. We used the cloud provider’s big data platform tools to stand up a common big data environment used by multiple banks that could scale as new banks were onboarded.
We applied a common industry data model that helped us map data quickly across banks. Model management tools allowed business users to validate pre-defined models that were improved by sharing what worked for other banks. Low code/no code allowed these users to create unique views of data and outcomes for their bank. None of this would have been possible so quickly and with such great efficiency without cloud.
In 2018, I published a data maturity model (PDF) that charts a journey to become a data-driven enterprise:
The journey has been unfolding as I charted. In fact, cloud has sped up the journey by handling data infrastructure and platform basics.
We’re now seeing the emergence of the final Industrialized level of maturity, where data is a competitive advantage for the enterprise. The differentiator now is the quality of data in the digital ecosystem.
Let’s look at the trends in the mature/Industrialized level identified in 2018 and how they are taking shape today.
What do all these trends point to? A federated approach to using data and models from others while remaining differentiated on your own.
And that’s important because, as Professor Re predicts, most of us will be doing less model creation and more model application. More than ever, we’ll differentiate on our ability to get the best data from many sources (including our domain experts) and how it’s seamlessly captured, integrated and applied. Cloud turns big data into data products that connect as part of an even bigger data continuum.
You bet. Cloud is making data—big data—the most valuable asset your company has today. Thanks to cloud, data is now available on-demand, at scale and democratized in its access, no matter where your company is located or the line of business.
I hope you now understand why I spent my Super Bowl weekend watching Professor Re’s talk. Those who are best poised to unlock value from data have access to that data and know the domain.
Who better than your own business to become the next data GOAT?