RESEARCH REPORT

In brief

In brief

  • AI is making it possible to extract more insights from unstructured content by examining a document’s presentational elements.
  • Important semantic information encoded within those elements–such as positioning, font and images–combine to communicate information to the reader.
  • However, most of this information is currently ignored by computers, which means organizations, too, miss out on new, potential value.
  • With intelligent document x-ray, businesses can obtain more accurate insights from unstructured data by extracting meaning from presentational data.


If there’s one thing I’ve learned over the course of my career designing user interfaces, it’s that style has substance – and substance has style. However, when it comes to extracting meaning from content through computers, the focus until now has been on substance (the text on the page) over style (the presentational elements).

When computers read documents, they don’t pay attention to stylistic details like where a word is on a page or how it might relate to other words. Yet presentation elements – positioning, colors, fonts, graphical elements and so on – contain important semantic information that text alone cannot convey.

As humans, we understand all of this without thinking twice. We know, for instance, that font size can denote importance or that positioning of a headline, paragraph or image can impact the meaning of a document. For machines, the more of this presentational information that can be read, the more accurate the understanding of a document becomes. But how does this help the end user? And what are the implications for business more broadly?

The rise of Intelligent Document X-ray now makes it possible to analyze both text and presentational elements of documents. Simply put, Intelligent Document X-ray looks at the text within the document as well as at presentational elements (italics, bold, font size, color), graphical elements (lines, tables, separators), and images. It also analyzes where these elements are placed on the page, what page they appear on, any slide separators, and other hidden metadata. This means we can see past the “flesh” of the documents (plain text) and into their real structure (skeletal x-ray).

Document X-Ray uses presentational data to reveal the hidden structure of your documents.

Document X-Ray uses presentational data to reveal the hidden structure of your documents. Copyright © 2019 Accenture. All rights reserved

Although today’s computers may not know what it “means” when something is in bold or arranged in a certain way within a document, AI is able to recognize patterns or presentational elements. So now it’s up to organizations to decide how to use that extracted information to create value.

Computers gain intelligence

Historically speaking there have been two key challenges when teaching computers to “read” documents: Firstly, the number of possible document formats in the past made it tricky to train computers to read them all. Fortunately, 95% or more of all content is now contained within four key document formats – Microsoft Office, PDF, HTML and images.

The second challenge has been knowing what to do with all the information – text and presentational – once it has been extracted.

Consider for a moment that, until recently, when a computer “read” a document, it could only process text. That left a variety of elements un-detected:

  • Text: YES
  • Images: NO
  • Indentation, tabs, blank lines: NO
  • Position on the page: NO
  • Font, font size, font decoration (italics, underline), font weight (bold): NO
  • Horizontal line: NO
  • Tables: NO
  • Boxes, call outs, sidebars: NO
  • Image captions associated with images: NO
  • Bullets: NO
  • Colors: NO

The inability to process presentational information up until now has largely hampered neural networks (NNs) in their ability to understand content to this point. Words that formed headers, footers, titles, sidebars and so on were processed as if they were a single linear list, with all presentational information removed. Now, the addition of presentational information allows NNs to read information more accurately, extracting the right meaning across more document types than ever before.

For example, it’s now possible to train a computer to copy and paste content from a source document to a form. First the system watches what you copy and where you paste it. Then, the NN learns how to copy and paste on its own – considering the text content together with the presentational elements such as where the text falls relative to other words on the page. While the NN is learning, the user interface can color code source content to make it easier for the user to approve it being copied. Over time, the computer then continues this operation automatically.

These systems that watch what users do and “learn” to do it automatically are the beginning of a new era in robotic process automation (RPA) known as cognitive RPA (CRPA) – but they only work when the text and presentational elements of a document are combined.

Other capabilities that this process enables include:

  • Rendering documents to thumbnails and browsable images
  • Tracking text position through text extraction and optical character recognition (OCR
  • Searching and highlighting of text on rendered images
  • Sub-document segmentation and sub-document search using NNs
  • Document segmentation and segment classification
  • Image extraction and image classification
These systems that watch what users do and "learn" to do it automatically are the beginning of a new era in robotic process automation known as cognitive RPA–but they only work when the text and presentational elements of a document are combined.

Document X-ray unlocks new possibilities

Once it becomes clear how critical presentational information is to accurately understand and extract data from content, it is easy to see how many new opportunities there are for organizations to harness Intelligent Document X-ray.

Automated PDF invoice processing is a classic example here, as computers are able to extract data by “learning” where certain text elements are placed on a page relative to other text or presentational content. By reading a combination of text and presentational elements, the computer is able to perform this once manual, time-consuming task efficiently and exactly.

Here are a few more instances of how enterprises are applying Intelligent Document X-ray tools to extract more data, more accurately from their content:

  1. PowerPoint slides
  2. By effectively rendering individual slides as thumbnail images and storing important semantic and presentational data, Intelligent Document X-ray makes it possible to search for relevant individual slides, providing a user with the ability to sift through vast numbers of documents at speed.

Where past search engines would have read the PowerPoint as one long list of words, Intelligent Document X-ray makes it possible to leverage individual pieces of content or information from a deck – as opposed to the deck as a whole. By giving users the ability to search for information contained within individual slides, search results turn up only very relevant slides within relevant documents.

At a glance, this is what one of Accenture’s slide sorter overviews looks like:

A view of one of Accenture’s slide sorter overviews.

A view of one of Accenture’s slide sorter overviews. Copyright © 2019 Accenture. All rights reserved.

In business terms, PowerPoint slides are a uniquely powerful content type for several reasons: each slide functions as a “semantic unit” containing a single thought or subject; slides are graphical in nature; and presentation is important to communication. Because of this, there is long-term potential for this to change how people work, creating previously unimagined efficiencies within organizations.

This same concept can also be applied to other document formats, such as PDFs and books, to surface specific information within dense or lengthy pieces of content.

  1. Geoscientist reports
  2. Organizations in the natural resources sector often have millions of data points and no simple way to classify them. For instance, one leading oil and gas company had reports dating back to the 1920s and needed a way to extract images and organize them by location.

    With the application of Intelligent Document X-ray tools to scan the millions of documents, and optical character recognition (OCR) to read text from images and captions, the computers were able to turn the information into machine-friendly data and geographically classify all reports. Images – including logs, seismic cross-sections, maps and magnetic lines – were correlated on a map of the world to assist in geographic searches or research for the placement of oil and gas wells.

  3. Pharmaceutical test procedures and master batch records
  4. Generally, complex pharmaceutical records need to be copied from written documents (PDF or MS-Word) and filed in electronic management systems, which often requires interpretation of tables, lists and sections, in addition to text.

    For example, one laboratory informatics consultancy needed a way to create electronic test procedures from PDF procedure documents to guide lab technicians through drug testing. As part of the testing process, those same people were having to read through lengthy documents, copying information to complete a form on the right-hand side of the screen.

    With the ability to understand the presentational elements of these documents, Intelligent Document X-ray makes it possible to accurately complete these forms automatically. By showing the computer what they were cutting and pasting and from where, the team was able to “teach” the system to complete these documents.

  5. Safety standards documents and policy and procedures documents
  6. By nature, policy documents are often lengthy and difficult to navigate. Intelligent Document X-ray makes it possible to segment these content formats into individual policies or procedures, tracking how they are related within the overall document and allowing readers to search for specific pieces of information.

    For example, one customer does safety checks on lumber mills. When they are on the mill floor, they often have questions like, “What is the safe depth for sawdust in this type of mill?”. The ability to navigate high volumes of complex documents quickly and accurately is therefore critical.

  7. Mail
  8. One of Accenture’s customers is a public governmental organization that receives more than 4.5 million pieces of mail each year. Sorting and processing that volume of mail becomes an incredibly expensive and laborious task.

    Now, with the help of computers to read mail, they are automatically able to route post to the correct department, identify the type of mail and why it was sent, identify the source from logos and designs, and extract key information such as names, dates and amounts. In all of these cases, presentational elements within the mail – position, color, font, graphical elements – are as important as the text itself to understanding and processing the mail.

  9. From engineering drawings to bill of materials
  10. Understanding presentational elements is also making it possible for computers to create new documents entirely. In the building and construction industry, for instance, Intelligent Document X-ray can be applied to read engineering drawings and extract information to automatically create a bill of materials (BOM) document.

    By searching engineering drawings for parts, where they are used and how they are connected, a BOM can be generated to include details such as what systems need to be redesigned or retrofitted when parts are end-of-life, help with troubleshooting, where faulty parts may be used, and hundreds of other critical needs for complex systems.

Intelligence combines substance and style

These examples are just a taste of what is becoming possible thanks to the insights derived from combining text and presentational data through Intelligent Document X-ray.

By focusing on both text and presentation, critical contextual cues are improving the accuracy and effectiveness of computers to not only read, but understand, content at scale. It’s the tip of the iceberg in a brave new world where we no longer have to choose between substance and style. Now we can have both.

Paul Nelson​

Innovation Lead – Content Analytics, Applied Intelligence​​​​

MORE ON THIS TOPIC


Subscription Center
Stay in the Know with Our Newsletter Stay in the Know with Our Newsletter