Smarter enterprise search: why knowledge graphs and NLP can provide all the right answers
May 6, 2020
May 6, 2020
We’re on the brink of a complete transformation in how we get the information and insights we need to work smarter and more efficiently. In this blog, I’ll show how the AI techniques that have enhanced Internet search can now be applied within organizations to revolutionize what enterprise search can achieve.
The amount of information available to us is extraordinary. And it’s growing exponentially all the time: already amounting to 44 zetabytes, data volumes are predicted to hit 175 zetabytes in the next five years (IDC) . Eighty percent of this data is unstructured (emails, text documents, audio, video, social posts and so on), and just 20% is held in structured systems of some kind.
To find answers from this massive resource and pinpoint exactly what we’re looking for, we need a way to extract facts from documents and store those facts somewhere for easy access. And today, search engine giants like Google and Bing are doing exactly that by storing these kinds of facts in a ‘knowledge graph’ which sits hand in hand with the search engines they’ve already been using for many years.
Proof of the effectiveness of their approach? It’s providing answers so successfully – and at such breath-taking scale – that we take it all for granted.
Over the past few years, you’ll have noticed a subtle but profound evolution in how we use search engines to find answers in our daily lives.
When search engines were first introduced, it quickly became obvious that the longer and more convoluted the question, the less chance there was of receiving an on-target answer. Because queries like “How many calories are in Tesco’s best-selling soup?” were unlikely to yield results, we became experts in keyword searching instead. By transforming our queries into phrases with keywords like “Tesco soup nutrition,” we found that search engines suggested more relevant documents, and even provided direct answers, unearthing vital nuggets of information that let us progress job tasks, further our knowledge, or settle an argument.
Nowadays, however, our expectations for search are more in line with the way we use digital assistants like Siri, Google Home, and Alexa, all of which are powered by search engines behind the scene. When we ask them questions, we get facts in return. As a result, we’re seeing search engine queries becoming increasingly ‘fact finding’ in nature.
The big change? Now, search engines can find, prioritize, and display the facts we need. They don’t simply return a list of pages (URLs) like they used to do. Instead, they inject answers to questions (when and where possible) along with detailed knowledge cards and other related search queries, all designed to help shorten the time it takes for us to access that critical fact. Impressively too, the results returned by search engines and digital assistants have been getting more accurate – and more intuitive – than ever.
Search engines like Google and Bing have been leading the way, thanks in large part to two significant innovations. First, in 2012, Google added a knowledge graph to its search engine. Later, in 2015, it introduced RankBrain. Both have been landmark developments.
And the same approach can now be applied to enterprise search. Adding this technology layer to enterprise search engines has the potential to make them smarter than they’ve ever been before. The game-changer here has been intelligent enterprise search (also known as cognitive search or insight engine). By combining search with a raft of AI technologies like natural language processing, semantic understanding, machine learning, and knowledge graphs, intelligent enterprise search can provide a significantly improved search experience – with vastly more insights – for users.
First knowledge graphs. In its drive to transform its search engine into a ‘knowledge engine,’ Google has been using knowledge graphs to provide structured and detailed information around entities like people, places, companies, and topics. Think of the last time you searched for a celebrity’s age or the opening hours of your local pharmacist and got the answers straight away, without going through lists of search results. This information probably came from the knowledge graph and not the search engine.
As such, they’ve proved enormously powerful in question-answer systems. The more hydrated the knowledge graph, the more insightful searches become. Populating a knowledge graph from structured data is relatively straightforward (assuming you trust the data source), doing the same from unstructured data requires the use of sophisticated natural language processing (NLP) techniques along with document authority models.
To show what can be achieved, consider the following piece of text. It’s packed with information:
“Gillian Russell was born in Invercargill. She is the CEO of Gingerbeard Limited and also the company secretary of Gingerbeard Consulting Group . Gillian lives in Wokingham, UK with her husband Phil Lewis.”
We can use NLP to extract and classify the facts mentioned in this text example as semantic triples. These are three pieces of information, subject–predicate–object, that can model almost any relationship between entities. This method of encoding information enables knowledge to be presented in a machine-readable way.
A knowledge graph representing related entities can be generated from these semantic triples. This knowledge graph, a powerful foundation for a question-answer system, can then be traversed to provide answers, even to complex questions.
There are, however, many things to consider here before we let the knowledge graph loose on all of our documents:
Assuming we can solve these types of questions for a given use case, a general process for modelling knowledge and creating a knowledge graph from this text example is illustrated below.
<<< Start >>>
Figure 1. Modelling knowledge
<<< End >>>
This knowledge model can then start to answer questions like:
As you can see, it’s a powerful resource.
The second innovation in this space is ‘word vectors,’ which harness machine learning techniques to model the variety and depth of word meanings. Ingeniously, by representing words as vectors, the AI-based system builds up a sense of how we use words and the associations between them.
For instance, in a simplified ‘mental space’ of an AI-based system below, the word “auntie” (a relative) occupies a different ‘mental space’ from “Auntie Beeb” (a nickname for the British news channel BBC). “Uncle Sam” (the federal government) doesn’t convey the same meaning as “uncle.” And because of their meanings, “auntie” and “uncle” are closely connected in the AI’s ‘mental space’ but “Auntie Beeb” and “Uncle Sam” aren’t.
<<< Start >>>
Figure 2: Representing words as vectors
<<< End >>>
The AI-based system can even understand how some words change in meaning over time (see Figure 3). Word vectors enable the search engine to know that a document containing ‘broadcast’ written in the 1850s should not be found when searching for ‘radio broadcasts’ in the 1950s.
<<< Start >>>
Figure 3. Words change in meaning over time
<<< End >>>
No surprise that word vectors immediately delivered a 15% improvement in Google’s accuracy for some query types. Subsequent innovations, like BERT, and others, have refined performance further, enabling even greater understanding of the words people use.
The really exciting thing for businesses? We can now start to replicate the Google-like search experience within organizations – reinventing what can be achieved by people when they’re augmented with smart machines.
The cloud search offerings from Google, Amazon, and Microsoft have all recently announced enhanced enterprise search solutions that integrate with knowledge graphs. Other traditional on-premise search solutions are also waking up to the benefits of integrating with a knowledge graph.
We can take advantage of the extraordinary collision we’ve been seeing in seemingly disparate technology innovations to revolutionize how our people search for facts and get the answers they want.
I’ve been in the search industry for 30 years, working on hundreds of enterprise search projects for organizations worldwide. And there’s never been so many opportunities to completely redefine what search can do. Harnessing the latest technologies, we can create new value from fragmented data points. It’s now possible to obtain unique insights into how multiple data pieces fit together.
And because AI technologies like NLP and knowledge graphs are maturing so swiftly, enterprises stand to benefit from these technologies’ ever-evolving problem-solving capabilities. Soon, we’ll be able to answer incredibly complex questions more accurately, and faster, than ever. Whether that’s identifying new medical treatments, spotting unseen market shifts or unearthing fraud, the benefits for organizations in every industry will be immense.