In business terms, PowerPoint slides are a uniquely powerful content type for several reasons: each slide functions as a “semantic unit” containing a single thought or subject; slides are graphical in nature; and presentation is important to communication. Because of this, there is long-term potential for this to change how people work, creating previously unimagined efficiencies within organizations.
This same concept can also be applied to other document formats, such as PDFs and books, to surface specific information within dense or lengthy pieces of content.
- Geoscientist reports
Organizations in the natural resources sector often have millions of data points and no simple way to classify them. For instance, one leading oil and gas company had reports dating back to the 1920s and needed a way to extract images and organize them by location.
With the application of Intelligent Document X-ray tools to scan the millions of documents, and optical character recognition (OCR) to read text from images and captions, the computers were able to turn the information into machine-friendly data and geographically classify all reports. Images – including logs, seismic cross-sections, maps and magnetic lines – were correlated on a map of the world to assist in geographic searches or research for the placement of oil and gas wells.
- Pharmaceutical test procedures and master batch records
Generally, complex pharmaceutical records need to be copied from written documents (PDF or MS-Word) and filed in electronic management systems, which often requires interpretation of tables, lists and sections, in addition to text.
For example, one laboratory informatics consultancy needed a way to create electronic test procedures from PDF procedure documents to guide lab technicians through drug testing. As part of the testing process, those same people were having to read through lengthy documents, copying information to complete a form on the right-hand side of the screen.
With the ability to understand the presentational elements of these documents, Intelligent Document X-ray makes it possible to accurately complete these forms automatically. By showing the computer what they were cutting and pasting and from where, the team was able to “teach” the system to complete these documents.
- Safety standards documents and policy and procedures documents
By nature, policy documents are often lengthy and difficult to navigate. Intelligent Document X-ray makes it possible to segment these content formats into individual policies or procedures, tracking how they are related within the overall document and allowing readers to search for specific pieces of information.
For example, one customer does safety checks on lumber mills. When they are on the mill floor, they often have questions like, “What is the safe depth for sawdust in this type of mill?”. The ability to navigate high volumes of complex documents quickly and accurately is therefore critical.
One of Accenture’s customers is a public governmental organization that receives more than 4.5 million pieces of mail each year. Sorting and processing that volume of mail becomes an incredibly expensive and laborious task.
Now, with the help of computers to read mail, they are automatically able to route post to the correct department, identify the type of mail and why it was sent, identify the source from logos and designs, and extract key information such as names, dates and amounts. In all of these cases, presentational elements within the mail – position, color, font, graphical elements – are as important as the text itself to understanding and processing the mail.
- From engineering drawings to bill of materials
Understanding presentational elements is also making it possible for computers to create new documents entirely. In the building and construction industry, for instance, Intelligent Document X-ray can be applied to read engineering drawings and extract information to automatically create a bill of materials (BOM) document.
By searching engineering drawings for parts, where they are used and how they are connected, a BOM can be generated to include details such as what systems need to be redesigned or retrofitted when parts are end-of-life, help with troubleshooting, where faulty parts may be used, and hundreds of other critical needs for complex systems.
These examples are just a taste of what is becoming possible thanks to the insights derived from combining text and presentational data through Intelligent Document X-ray.
By focusing on both text and presentation, critical contextual cues are improving the accuracy and effectiveness of computers to not only read, but understand, content at scale. It’s the tip of the iceberg in a brave new world where we no longer have to choose between substance and style. Now we can have both.