An ancient Indian parable tells the story of a group of blind men describing an elephant. One feels the legs and describes the elephant as big and stocky; another feels the tail and describes it as thin and wiry; yet another feels the trunk and describes it as wet and slithery. Each is correct within his limited sphere of experience, yet no one gets the full picture.
In my view, there is an elephant in the world of IT, one that represents big, impending changes for the future. Current buzzwords like “cloud computing,” “Big Data” and “service-oriented architecture” are simply well-intentioned blind men’s descriptions of this elephant—vague, not entirely incorrect, definitely not comprehensive, each trying to describe some aspect of a larger phenomenon that nobody yet comprehends.
While the big picture may be elusive, there are three forces or undercurrents that seem to be shaping much of IT.
1. Everything will be distributed
There is no doubt that fully centralized IT—all applications, data, servers and storage residing in a single data center—would be the ideal solution for most organizations, both architecturally and operationally. So why would anyone distribute their applications and data?
In the past, distribution of enterprise IT was either a matter of accident (departments wanted their own trophy data centers or the company grew through mergers and acquisitions), a deliberate decision to overcome network congestion and latency (keeping the applications and data closer to the users) or required as a matter of regulatory compliance (certain data and applications had to reside in specific geographic boundaries).
Today, several factors are conspiring to make distributed IT the new normal.
The availability of hardware and software as services provides financial incentive to distribute corporate applications and data with multiple providers. The need and desire to operate intercompany processes by integrating your systems with those of your suppliers and partners leads to data and process distribution. Integration standards enable you to source common services—say, credit verification—from a third party, making system building faster and cheaper.
As applications are distributed across multiple platforms and locations, corporate data will also be distributed. The need to integrate and utilize external data—data from the Web, data from emerging “data as a service” vendors—is yet another reason for dealing with distributed data. Finally, the sheer volume of data generated by sensors in certain industries (such as electric utilities) makes it infeasible to collect all the data in one place for processing.
The distribution of applications and data across many locations and providers is technically challenging; it also has many business implications.
Your corporate data is under the custody of many third parties that themselves may be sourcing some of their software and services from other third parties over which you have no control. Third-party providers, in order to enhance performance and to support backups and recovery, may maintain multiple copies of your data within their complex and proprietary architectures, making it difficult if not impossible for you to monitor, audit or delete information.
As your systems interoperate more and more with third parties, there is also the problem of trust and authentication—how would you know that the application you are exchanging data with is your trusted supplier and not an imposter?
2. Everything will be decoupled
Distribution requires decoupling, and decoupling enables distribution. When applications and data are distributed—that is, when they reside in multiple places, in multiple platforms and are owned by multiple providers—they can no longer be monolithic.
In the case of applications, they have to be modular and be able to interoperate with other applications. Fortunately, principles for modular design and standards for interoperability have emerged, matured and gained broad, industry-wide acceptance. Indeed, this was the intent of the last big IT hype, service-oriented architecture, which many now consider just that: hype without much substance. The truth, however, is somewhere in between.
Much like the blind men describing the elephant, many IT experts had only a partial understanding of the challenges of decoupling. They focused on decoupling applications into interoperable and modular “services.” In this, service-oriented architecture has been largely successful. Online companies have fully embraced SOA principles; any new business application written today is likely to be service-oriented.
But SOA turned out to be only a partial solution for decoupling because its proponents focused on applications and not on data. Decoupling or partitioning data is not a trivial proposition—and this is the problem that’s at the heart of the current interest in “Big Data.”
While IT has evolved and changed dramatically in the past 25 years, the main data storage paradigm—the relational database—has remained constant. The relational database has served us well over the years, across a variety of applications; however, it is not very good at being distributed—in more technical terms, it does not have partition tolerance.
As a result, a number of new, non-relational data management paradigms have emerged. Collectively called NoSQL (which stands for “not only SQL,” the data access used by relational database systems), these databases try to address a number of problems: managing distributed data, real-time data, multimedia data, metadata and so forth (see footnote).
As such, Big Data is less about big—which is true but incidental—and more about managing new kinds of data and dealing with new kinds of data management paradigms. Among other things, NoSQL or Big Data approaches are aimed at processing very large volumes of information—often in real time—that may or may not be structured or be in one central database.
3. Everything will be analyzed
The key word here is everything. Metadata—data about data, such as who accessed it and when and where it came from—is growing at a much faster rate (estimates range from two times to 20 times) than the underlying data.
Analysis of metadata is becoming routine in spotting security threats. Many websites analyze customer interaction data to automatically customize their site for individual users. Retailers are beginning to analyze video footage of customer traffic in stores to predict conversion rates. Analysis of social networks to gauge consumer sentiment is becoming routine in most marketing departments.
A dramatic increase in customer analysis is made possible by another factor: the emergence of Facebook as the de facto identity manager across much of the Web—more than 2 million websites now let you log in using your Facebook account—and according to Facebook itself, more than 250 million users log in to third-party websites with their Facebook account every month.
Why is this significant? Because when you visit a website you are no longer an anonymous user. With your Facebook account, the website now has access to a lot more information about you—your interests, your social conversations, your friends and whatever else you’ve chosen to make public.
Analysis is by no means confined to consumers and websites. In manufacturing, information from sensors is used for optimizing factory operations. Information from RFID tags helps track goods through the supply chain. In electric utilities, real-time information from smart meters is used to match consumption with generation, optimize the distribution network and turn the electric grid into a “smart grid.”
We often dismiss vagueness as fuzzy thinking. While this is true in most cases, vagueness—particularly sustained vagueness that captures people’s imagination—can be a harbinger of something new, something unfamiliar, beyond the scope of our current vocabulary. The next time you hear another IT term that seems to lack definition or precision, it’s most likely related to one of the three forces outlined above.
Postmodern computing 2.0, anyone?
To be sure, distribution is not the only force driving the data access paradigm. The dramatic growth in unstructured information—now estimated to be 80 percent of all information in the world—requires new data management approaches other than relational databases, which were aimed at supporting highly structured data used in business transactions (back to story).
About the author
Kishore S. Swaminathan is Accenture's chief scientist and the global director of Accenture Technology Labs' systems integration research. He is responsible for defining the company's vision for the future of technology and setting its research and development agenda. Based in Beijing, Dr. Swaminathan has spent his Accenture career researching cutting-edge technologies. Winner of the 2000 Computerworld Smithsonian award for the best application of IT, Dr. Swaminathan has worked on more than a dozen research projects and has as many patents to his credit.