Acquire and enrich enterprise content for Microsoft search
November 5, 2019
Enhance the intelligent search experience with Accenture Aspire Content Processing
A long-time Microsoft partner, we’ve supported the implementation of their evolving search solutions, from FAST ESP and Microsoft SharePoint to Microsoft Azure search. Earlier this year, we were very excited when Microsoft announced the release of Microsoft search – “an intelligent, enterprise search experience from Microsoft that applies the artificial intelligence technology (AI) from Bing and deep personalized insights surfaced by the Microsoft Graph, to make search more effective.” Microsoft search provides access to data in Microsoft 365 through a cohesive search experience across Microsoft 365 apps and services, Bing, Windows, and the Edge browser. We expect that enterprises will greatly benefit from these powerful capabilities. But to get the most value of any search engine, we need to start with good data. This is where we leverage our Aspire Content Processing technology to support Microsoft search implementations. Today, we are excited to announce that our Aspire publisher for Microsoft search is ready for customer preview. Organizations can leverage Aspire for content enrichment and access to 40+ repositories via our connectors, ultimately accelerating speed-to-value.
<<< Start >>>
“Microsoft search delivers intelligent search across Microsoft 365 apps and services. By using Accenture’s Aspire, our enterprise customers can connect to third-party content sources, process and enrich that content, and then bring it all together into Microsoft search. Combining Aspire with Microsoft search’s powerful features, enterprises can make use of unstructured data faster, easier, and transform their entire insight discovery experience.”
- Mike Ammerlaan, Director, Microsoft 365 Ecosystem at Microsoft Corp.
<<< End >>>
Read on for a technical overview from our Steve Denny, who led the development of our Aspire publisher for Microsoft search.
The trials and tribulations of search solutions that evolve
“Once upon a time there was a kingdom. The king had a huge amount of data about his alliances and foes but couldn’t leverage it to his kingdom’s advantages. His council advisors solved this by placing his data in a search engine, making it possible for the king to search the data and formulate his strategies.
The king then noticed that some of his search results were blank, badly formatted, or didn’t display important insights. The advisors changed their program to fix those search limitations. But soon, as the king collected more data, he continuously switched to the latest search engines to hold his new knowledge. And with every new search engine, the advisors had to go full circle updating the program, reformatting the data, and writing new code for the new engine to work. It wasn’t efficient, but the cycle continued as the king went for smarter, faster, newer search engines….”
Sounds like a familiar story? It’s one that we’ve dispatched to the mists of time here at Search and Content Analytics.
Gain value from a search engine sooner, without the pain
As our clients implement search engines, how do we make the process as efficient and painless as possible? We addressed this by building Aspire, which has been used in hundreds of projects. Aspire ingests and enriches unstructured data – text, images, audio, and videos – from multiple sources, providing relevant context for search and analytics applications.
For a brief overview of Aspire Content Processing, watch our video below.
<<< Start >>>
Aspire Content Processing animated video
<<< End >>>
Aspire is designed in a modular way:
Connectors: “connect” to an enterprise content repository (Files, RDBs, SharePoint, to name a few of the 40+ we’ve built) and extract the content (and security when appropriate).
Processing modules: the data is placed in an internal format and passed to the processing modules. These can cleanse or normalize data, perform OCR, send it out to a classification or categorization service, or (with some integration effort) do pretty much any required enrichment.
Publishers: you guessed it – they publish the data, typically to a search engine, but it could be anywhere (for example, file systems, HDFS, Hadoop, Amazon S3, Azure).
The key advantage of this modular architecture is that you can swap in and out the connectors, processing modules, and publishers without reference to each other. You’re crawling files and publishing to SharePoint but still have data in a relational database? No problem – just add a relational database connector. You want to swap search engines? No problem – just switch to the publisher for the new engine. Aspire’s architecture also includes frameworks for the efficient development of new connectors and publishers. When a new search engine comes along, we can go 0-60 very fast.
Even if we don’t have an existing connector for your in-house solution, Aspire can abstract the “data get” from the operations that are common to all connectors. Once you’ve written the code to list the contents of a “container” (for example, a folder, a SharePoint list, or an Exchange folder) and get a single document, you’re pretty much there. Aspire will handle the queueing of items to process, the decision as to whether something has changed, and if an item needs recursive scanning.
Similarly, to create a new publisher, you need to know how to post a “document,” but Aspire will handle the logic of batching and allow hooks for the beginning and end of crawls (so you can clear or commit an index if required). The framework also has a number of built-in “common” connections (REST, RDB and so on) so you don’t need to develop them yourself.
So, we were excited when we heard about Microsoft search – we can apply Aspire frameworks to quickly develop a new publisher that enables clients to push content into Microsoft search easily.
Accelerate your Microsoft search implementation
Our pre-built publisher for Microsoft search gives clients the ability to bring data from various repositories supported by our 40+ connectors into Office365. From your legacy file systems and on-premise databases to Jive, Documentum, Confluence, if I had a dollar for every client who’d asked to search their on-premise data in Office365…
With a broad knowledge base from diverse repositories, enterprise users get the most relevant answers curated from Microsoft application data, third-party sources, and the Internet, all in one place.
Publishing content from Aspire to Microsoft search – workflow
<<< Start >>>
Aspire content ingestion, enrichment, and publishing to Microsoft search - Typical workflow
<<< End >>>
Ingestion, processing, and enrichment
Aspire connectors will extract the text and metadata of documents from your data repository. In addition, you’ll have access to Aspire’s content enrichment workflow which allows you to add, normalize, or cleanse data before it’s pushed to Microsoft search (for better search relevancy). And of course, the processed data will inherit the same security access control from your data source, ensuring the end-users will only see the documents they’re entitled to.
Our Microsoft search publisher uses the new Microsoft search API, which is part of the Graph API and works over http. In short, you authenticate and connect to a dataset and then push the data. It really is that simple! There’s a pre-built simple schema for files and a “custom” one that will allow you to define custom fields to store all of your data. You’ll need to define the schema and mapping of the data that Aspire has extracted to the fields in the schema. The publisher will then load the schema and send it to Microsoft search before publishing the data. Slightly more complex, but it’s still just configuration, not programming.
Once the data is indexed, it will appear alongside your other data up in the Office365 cloud. The data source will have its own tab, so you can easily locate it, but that aside, it will be indistinguishable.
If you’re an Office365 user, like many of our clients, looking for a single-entry point for your search across many data sources, Microsoft search could well be for you. We look forward to releasing our publisher that will allow you to leverage the power of the Aspire connectors to bring data into Microsoft search and search your on-premise data alongside your Office365 documents.
By integrating Aspire Content Processing’s capabilities and Microsoft search, we can help improve data acquisition and enrichment, accelerate information discovery, and ultimately increase business value.