Maximize collaboration through secure data sharing
October 1, 2019
October 1, 2019
Faced with the challenge of competing in an increasingly AI-driven world, enterprises are becoming more aware of the importance of expanding their access to data through third-party partnership ecosystems to create new advantages and opportunities for growth. The increasingly distributed nature of customer data means many enterprises do not generate the necessary levels of data on their own to derive the unbiased insights required to provide new experiences, open new revenue streams and apply new business models.
This growing need to share data is reflected in a recent Accenture C-suite survey, where 36 percent of executives indicate that the number of organizations they partnered with had doubled or more in the last two years. The same survey also revealed that 71 percent of executives anticipate the volume of data exchanged with ecosystems to increase.
Similarly, according to a recent Harvard Business Review Analytics Services Survey, 78 percent of companies highlighted the ability to easily access and combine data from a variety of external sources as very important for a data-driven enterprise. However, only 23 percent said they were currently very effective in this area, and 15 percent were sharing data with key vendors and suppliers. Why are enterprises failing to unlock the trapped value of their data and that of their ecosystem partners?
A new family of Privacy Preserving Computation (PPC) techniques, 30 years in the making, are poised to significantly disrupt the enterprise data exchange space.
While many rightly point to difficulties in overcoming legal and technical hurdles, these challenges are not insurmountable. The major barriers to effective data sharing that exist today include the fact that:
However, a new family of Privacy Preserving Computation (PPC) techniques, 30 years in the making, are poised to significantly disrupt the enterprise data exchange space. These techniques are set to address the two key barriers by allowing data to be jointly analyzed without sharing all aspects of that data. By doing so, companies can gain back control of their data and the risks associated with sharing it, even when used beyond their borders.
A new family of Privacy Preserving Computation (PPC) techniques are poised to significantly disrupt the enterprise data exchange space.
Privacy Preserving Computation (PPC) techniques are a family of very modern cybersecurity techniques that, instead of focusing on protecting data from access by unauthorized parties, look at how to represent data in a form that can be shared, analyzed and operated on without exposing the raw information. Encryption techniques often form the core of how PPC techniques provide these capabilities, but they are used in a slightly different way than usual.
Traditionally, encryption was used to ensure the security and integrity of sensitive data against unauthorized access while in transit between parties and while at rest. Although encryption provides a reasonable amount of protection from outside interference in transit, in order to process the data, the data recipient must have access to the keys to decrypt that data. However, there are two risks during this process that should be considered:
These risks could limit the value businesses can extract from their sensitive data because they hinder potential data sharing collaborations.
PPC techniques use encryption differently to provide a mechanism to share data with other parties while limiting how or where the other parties can access the data, what parts of the data they can see, or what they can infer from the data. There are different schemes adopted by different PPC techniques to achieve this, but they usually do one or more of the following:
You could think of this as cooking a meal without seeing the ingredients or doing a jigsaw puzzle without seeing the picture of the intended outcome.
Beyond the traceability and control of data considerations, [PPC] technologies enable partners to work in a decentralized way, giving them the opportunity to jointly investigate common or shared business issues.
Below are some of the primary PPC techniques that are gaining prominence today. (You can also dive into details for each technique in our full report.)
While these PPC techniques and technologies are still new, they are rapidly maturing and are now at a point where they can be used in real business use cases. Securing data at rest, in transit and even during computation is now possible using Trusted Execution Environments. Publicly sharing statistical data without compromising the privacy of individual records is now possible with Differential Privacy. Analyzing encrypted data is now possible thanks to the development of encryption schemes, such as Homomorphic Encryption. Through these technologies, sensitive data can be encrypted and protected at all stages and can be used by a number of trusted or untrusted parties to generate insights without unintentionally exposing data.
With this peace of mind, PPC techniques open many new opportunities for enterprise collaborations that were not previously possible due to risk or regulation. Beyond the traceability and control of data considerations, these technologies enable partners to work in a decentralized way, giving them the opportunity to jointly investigate common or shared business issues. Companies are also able to apply AI and improved analysis methods to datasets that they had not previously had access to. This means collaborations with external parties—even competitors—are now possible, and in some cases, well underway.
PPC techniques open many new opportunities for enterprise collaborations that were not previously possible due to risk or regulation.
Following a long incubation period, PPC techniques are on the cusp of a new phase of industry adoption thanks to the alignment of technological capabilities with market needs. The potential opportunity has led to the creation of a rapidly evolving and heavily funded start-up ecosystem. And innovative enterprises and institutions are investing in and experimenting with these techniques to understand the art of the possible.
For one, Google released its open-source Private Join and Compute protocol this year, leveraging Homomorphic Encryption and MPC. While still at an early stage in terms of enterprise robustness, the protocol serves to illustrate the growing importance of PPC techniques.
PPC techniques are also being used today to help competitors operating in the same market or to allow collaborations in highly regulated fields, such as drug discovery. In one of the first commercial implementations of MPC, the Danish Sugar Industry collaborated with Partisia® to develop a confidential production contract exchange amongst sugar beet growers, enabling the industry to readjust to new market situations. Separately, in 2019, 10 large pharmaceutical companies developed the Melloddy consortium, which uses blockchain and federated learning* to train a drug discovery machine learning algorithm using shared data.
Additionally, PPC techniques are enabling enterprises to develop new, trustworthy data-sharing relationships with consumers. For example, Kara is a privacy-preserving, tokenized data cloud, leveraging trusted execution environments and differential privacy to create a secure way for patients to share and monetize their medical data with researchers, while retaining full control of their data. Kara runs on Oasis Labs’ blockchain-based platform and is the basis of a medical trial currently being run at Stanford University. Medical researchers can submit AI systems for training, without ever seeing the underlying data.
PPC techniques are also being used today to help competitors operating in the same market or to allow collaborations in highly regulated fields, such as drug discovery.
PPC techniques are also being used to address many regulatory concerns in markets, such as banking, by accessing and processing sensitive data in encrypted form to derive insights only. For example, ING Belgium uses Inpher’s XOR Secret Computing Engine to build analytical models using data from multiple countries like Switzerland and Luxembourg that have stringent data security and personal privacy rules. Proprietary algorithms generated by ING data science teams are compiled with XOR and secretly computed by all regional data centers and/or cloud services providers without revealing any sensitive information; no personally identifiable information is exported from any jurisdiction.
Furthermore, PPC techniques have already been used by governments. In 2015, the Estonian government worked with Sharemind® to develop the Private Statistics Project, which performed an analysis of a combination of identifiable tax and education records using MPC. The European Commission PRACTICE project analyzed this project and agreed with the Estonian Data Protection agency’s findings that no personal data had been processed.
There are even plans to use PPC techniques in upcoming elections: Travis County, Texas is set to implement STAR-Vote—a Secure, Transparent, Auditable and Reliable voting system—which uses homomorphic encryption, to monitor the verified voting process ahead of the 2020 presidential election.
Accenture has been helping companies adopt AI and blockchain—and we’re looking at PPC techniques to lower barriers so AI can access more data.
While the ability to securely share sensitive data presents immediate opportunities, there are also emerging opportunities to disrupt existing markets through the combined effect of PPC techniques and other technologies like blockchain and IoT. Currently, a number of companies and organizations are considering these technologies in areas such as:
PPC techniques are a major disruptor, and Accenture is working with banks to understand how technologies like Homomorphic Encryption could help in cross-border anti-money laundering and anti-fraud use cases.
At Accenture, we’re working across multiple industries to bring secure data sharing and enable greater collaboration.
For starters, teams across Accenture Tech Labs, Liquid Studios and the Dock Innovation Centre in Dublin are calibrating our understanding to assess the various Homomorphic Encryption and MPC frameworks and whether they can address the types of computation each use case requires. In this nascent field, different frameworks specialize in different types of computation—i.e. arithmetic, linear regression, or random forest models—making them well-suited to certain use cases and less suited to others. We are assessing the various trade-offs with each PPC technique, especially in terms of its impact on performance and how new styles of hardware design—like Field Programmable Gate Array* chips (FPGAs) and General Purpose* CPUs (GPGPUs)—can reduce the impacts on time and cost of using these technologies.
As we identify opportunities, we look at PPC techniques to expand the aperture of available data for AI. Accenture has been helping companies adopt AI and blockchain—and we’re looking at PPC techniques to lower barriers so AI can access more data, including high risk and confidential data.
For example, we are working with semiconductor ecosystem parties to create a trusted, distributed way to share data using MPC and blockchain. Equipment manufacturers need data to deliver better solutions for their equipment, parts and services, and suppliers need to protect their data as well as that of sub-tier suppliers and customer-restricted data (i.e. data related to on-wafer, off-line metrology and integration). While blockchain provides traceability and control of data views, IP issues are so severe that the equipment manufacturer that operates on raw data is usually reluctant to share data, even if the analytics processing never leaves the network. MPC will be able to solve this problem and enable trusted and secure data analytics.
With the growing argument for ecosystem collaboration and the need for effective and secure methods of data exchange and collaboration, PPC techniques and their successors will be critical to effective, safe and secure data sharing.
Specifically building off blockchain, we are customizing PPC techniques for companies cooperating on a shared blockchain accounting system or another similar distributed ledger. This combination has applications where companies have requirements to deal with both privacy and auditability at the same time. We’ve also recently released a new Open Source project called PyHeal to help make using some of these frameworks, like Microsoft SEAL, more broadly accessible and adoptable to users in a business context.
PPC techniques are a major disruptor, and Accenture is working with banks to understand how technologies like Homomorphic Encryption could help in cross-border anti-money laundering and anti-fraud use cases. It could be of huge benefit for banks to be able to ask questions on potentially fraudulent transactions to other banks without having to expose their customer’s data or request data from the other bank, which is often impossible because of banking confidentiality laws and other similar legislation. Technologies such as MPC and Homomorphic Encryption would allow the banks to answer questions on a virtual, shared dataset without the need to share the actual data with each other.
As discussed, innovative companies have already begun to deploy PPC techniques in real-world scenarios, combining internal capabilities with external PPC expertise. Enterprises looking to embrace these possibilities need to:
With the growing argument for ecosystem collaboration and the need for effective and secure methods of data exchange and collaboration, PPC techniques and their successors will be critical to effective, safe and secure data sharing. This in turn will be fundamental for businesses to gain value from their data and provide new avenues for disruption. We’re on the path to industrializing PPC offerings so our clients can take advantage of new near- and long-term opportunities. Let’s talk about what that could mean for your business.
*Glossary of terms:
Data Obfuscation: The process of hiding the original data by modifying the content, i.e. by replacing certain parts of the content with meaningless content while keeping the data usable. Usually used to protect sensitive or personally identifiable data and is also referred to as Data Masking.
Anonymization/De-identification: Refers to types of obfuscation that are intended to maintain privacy by replacing personally identifiable content, i.e. names, addresses, phone numbers etc., with values that don’t have direct relationships to that person.
Internet of Things (IoT): Refers to the embedding of systems and sensors into physical devices and objects that allow them to interconnect with each other and to the wider internet without human intervention.
Field Programmable Gate Arrays: Types of microchips that allow their own internal configuration to be configured by end users or integrators that allow the chips to be set up in a way that is tuned and tailored to the exact use case of the owner. This allows greater performance to be achieved by using hardware tailored for a given purpose without the need to commission custom hardware.
General Purpose Graphics Processing Units (GPGPUs): Hardware chips specifically designed to handle the highly parallel tasks of rendering and refreshing complex (2D and 3D) graphics on a screen. It was found that these same chips were much more efficient than standard CPUs at doing other processing tasks with parallel loads. GPGPUs are an extension of the same types of chip but tailored further toward more general parallel data processing than graphics processing.
Federated Learning: An approach to machine learning for training a central, shared model using data that is distributed across multiple locations rather than available centrally. It has applicability where all the training data is not available in the same place or at the same time or where it is not possible or desirable to bring the training data into a central location. It allows the data to be used where it exists without the need to remove it from its location (i.e. a mobile phone or other device) and then uplinks the learnings back into the central model without having to send the actual data back.