Incorporating search and analytics into your GDPR compliance strategy
August 6, 2019
August 6, 2019
The GDPR, the EU regulation aiming to strengthen and unify the protection of personal data for EU citizens, took effect on May 25, 2018. You can learn more about the regulation here. There’s no doubt that data, including the types governed by the GDPR, brings immense insights to support business decisions and outcomes. So, what does this regulation mean for businesses?
It’s worth noting that the GDPR will impact not only EU-based companies but also those that handle EU citizens’ personal data, regardless of their locations. Furthermore, given the state of Brexit, the GDPR maintains on its website that “If you process data about individuals in the context of selling goods or services to citizens in other EU countries then you will need to comply with the GDPR, irrespective as to whether or not you [sic] the UK retains the GDPR post-Brexit. If your activities are limited to the UK, then the position (after the initial exit period) is much less clear.”
<<< Start >>>
<<< End >>>
For many organizations, the volume of data they hold makes compliance effort very daunting. For instance, how do you know where to locate the right data for compliance? How do you track consent? How do you monitor and detect non-compliance incidents? Search and analytics approaches can be very useful for addressing these questions. Here are ten things to consider incorporating into your GDPR compliance strategy, and why.
The GDPR applies to all data, both structured and unstructured content. Consider doing the following:
As discussed in my other blog post, search engines are effective and scalable and can search over high-volume, high-variety structured content (e.g. tables of data) better than relational databases. So, consider:
To handle unstructured and semi-structured data, it’s useful to think about how you can leverage NLP. Using NLP techniques and tools, we can extract PII from structured and unstructured content. For instance:
Once “we structure the unstructured,” cleansing and normalization can come next. Our content processing methods work efficiently with:
At Accenture, we are evolving “multi-model” approaches to identifying PII:
And using a combination of all of these approaches - this is an important point because the presence of PII often cannot be determined by a single approach, but rather the collaborative effort of multiple approaches all working together.
<<< Start >>>
<<< End >>>
It’s not just enough to identify the presence of PII, but you will also need to identify exactly which person is identified.
But how? This is very tricky especially when there are many John Smith's and Jane Doe's in the world. Even worse, oftentimes these names or other IDs are incorrect and misspelled.
We can help clients leverage matching technology (originally developed for recruiting and product companies but can be applied in diverse business use cases).
With matching, we use both structured and unstructured signals with machine learning to match person records from across multiple sources. This matching method has many advantages:
These features are necessary because the GDPR will require that you accurately and completely identify a person’s information across your entire enterprise so that you can fully remove that person from your company’s databases when they want to be forgotten.
Matching on name + birth date is not enough (if it ever was). A matching algorithm which incorporates all available signals will be required.
Security is crucial. And so, consider building and/or tuning a scalable search application to have fine-grain document-level security controls. This will ensure that only the right individuals can access the documents intended for them.
You will be spending time with your database inventory, ingestion, and discovery. Be careful to maintain document-level security controls throughout the process.
We can also help ingest ACLs from underlying content sources. This can go a long way to determining risk. For example, if you have a UK employee who created a document to which perhaps only two people have access. This is a much lower risk than a document which allows public access. Ingesting ACLs from underlying content sources can help determine levels of risk for your PII sensitive information.
It’s also worth paying more attention to your indexing process. From our client projects, we’ve developed index design approaches that would encrypt the entire index with external keys without loss of performance or functionality. For example, even if the entire index is downloaded to someone else’s computer, the encryption would make the index useless without the proper access rights.
Encrypted search engine indexes can be an important safeguard when storing and searching PII sensitive data.
To detect non-compliant incidents in real-time, consider approaches like our NLP and entity extraction methods, which are libraries that can be run on streaming data supported by technologies like:
The following data classification techniques can provide you with some options to think about:
Depending on your organization’s approach to data provenance, you can look at different supporting tools and techniques. As an example, our Aspire Content Processing framework’s ingestion approach always maintains the source IDs, source locations, and original hierarchy tree of where the original documents were located.
Consider the followings throughout your data management and compliance process:
When you start thinking about data processing and considering a holistic, search-centric view of all your data sources, these ten things can help chart out a strategy for maintaining, monitoring, and ensuring that your organization remains GDPR-compliant. Similarly, in addition to the GDPR-related use cases, these approaches can also support many enterprise compliance, fraud, and risk applications.
<<< Start >>>
<<< End >>>