Enhancing Microsoft search with Aspire Content Processing framework
September 22, 2020
About this time last year, I wrote a blog about the trials and tribulations of a King trying to implement a search solution in an everchanging world. In that blog, I provided an overview on how Accenture’s Aspire Content Processing technology and Microsoft search work together to deliver a smarter search experience.
Over the past year, we’ve continued working with Microsoft, feeding back out thoughts on the Microsoft search capability announced at Microsoft Ignite 2019 and updating our publisher for Microsoft search to ensure we make the best use of the new features added to Microsoft search since then. We’ve also added 10 additional connectors, including Microsoft OneDrive, SharePoint 2019, and SharePoint Online, and currently are developing more, such as Microsoft Exchange (both on premise and online) and Google Drive. We now can extract data, including security groups, identities and access control lists (where appropriate) from over 50 different data sources.
At this year's Microsoft Ignite, we presented a demo of Aspire in action - including content ingestion via our connectors, workflow configurations, thumbnail extraction, and publishing content to Microsoft search in SharePoint Online. Watch the video below to learn more about Aspire and see our demo.
<<< Start >>>
Accenture's presentation at Microsoft Ignite 2020: Publishing content from Aspire to Microsoft search
<<< End >>>
A modular design to future proof your search and analytics infrastructure
This year, we find ourselves in a “strange new world,” but I think the problems I outlined in last year’s blog have become more prominent, and the need to be flexible is even more important. With companies finding staff working from home, away from the sources of information they use on a day to day basis, it’s critical to provide all the information they need in a single unified place. It shouldn’t matter if this comes from the latest cloud technology, or the oldest data repository that the King installed when he was a Prince.
This is why a solution utilising a modular content ingestion platform like Aspire can add tremendous value. The modular design allows you to build your solution with the data and search engines available today while knowing that your system is future proofed. When the technology changes and you find yourself with data in some new form of storage, you can easily get the new data in to your system without having to re-work the data processing for the content sources already ingested (unless you want to of course!).
<<< Start >>>
Publishing workflow from Aspire Content Processing to Microsoft search
<<< End >>>
A foundation of our Aspire Content Processing solution is our connector framework. Our engineering team works tirelessly to create new connectors for either brand new, or enhanced versions of cloud storage solutions, customer records solutions, ticketing solutions, and so on. View the full list of our connectors here.
Scaling Aspire for Microsoft search and extensive enterprise search applications
Unsurprisingly, enterprises leveraging Microsoft search and other search engines continue looking to crawl more content sources and larger quantities of data. For instance, the current version of Aspire – Aspire 4.0 – has been used to populate a Petabyte data lake and an application that has categorised over 5.4 billion files (approximately 6,900 Terabytes from approximately 40,000 file shares). But we didn’t stop there. We’ve examined and updated the architecture to ensure that we’re able to scale to larger enterprise needs. We're also planning for new enhancements and scalability in our upcoming Aspire 5.0 release.
Aspire 5.0 will retain all of the flexibility of Aspire 4.0, utilising a modular architecture that enables quick development (either by our engineering team or your own developers) or bespoke connectors and publishers. The main components of our architecture include:
Connectors to “connect” to an enterprise content repository (Files, RDBs, SharePoint, to name a few) and extract the content (and security when appropriate).
Processing modules to manipulate the data from the connectors (in an internal, consistent format). These can cleanse or normalize data, perform OCR, send it out to a classification or categorization service, or (with some integration effort) do pretty much any required enrichment. One of our latest developments is an extensible “thumbnailing” service allowing you to create thumbnail images for the ingested documents giving the user an even richer search solution. Currently, the service supports Word, PowerPoint and PDF files, but we’re developing support for other formats.
Publishers (you guessed it) to publish the data, typically to a search engine; but it could be anywhere (for example, file systems, HDFS, Hadoop, Amazon S3, Azure).
<<< Start >>>
A richer search UI with thumbnail images for document results
<<< End >>>
What’s new in our upcoming Aspire 5.0 release?
Manager and worker nodes
Aspire 5.0 significantly updates to architecture to add manager nodes that control crawls and distribute work to worker nodes that are responsible for processing the data. A new user interface (UI) will allow you to control single or multiple “seeds” (the location of data) at a single time. Need to crawl a new data source? Just type the URL into the interface. Need to add 100? Cut and paste a list of the URLs into the UI and have 100 created. Once you’ve got some seeds, select as many as you want and press “go” to start crawling the data.
Alternatively, you can integrate with or automate your use of Aspire from your chosen system using a fully functional and documented REST API.
We’ve also built in a mechanism to address one of the crawling problems we see most frequently – disruption to the user experience of the “source” systems due to the load that crawling can place on servers. Most cloud-based solutions can throttle your connection if you’re crawling a large amount of data, but your “internal” file server may just slow down (for everyone) or stop responding. Aspire 5.0 will allow you to set thresholds for servers, avoiding system disruption whether the data is local or in the cloud.
Leveraging our publisher for Microsoft search combined with more than 50 existing connectors and a wide range of content enrichment modules, we can help your organization leverage the power of the Aspire to bring data into Microsoft search and search your on-premise data alongside your Office365 documents.
By integrating Aspire's unstructured content processing capabilities and Microsoft search, your organization can improve data acquisition and enrichment, accelerate information discovery, and ultimately increase business value.
For more details on the Aspire-Microsoft search integration or how we can help improve your search application, connect with us.
There is already a separate, active Accenture Careers account with the same email address as your LinkedIn account email address. Please try logging in with your registered email address and password. You can then update your LinkedIn sign-in connection through the Edit Profile section.