Digitizing archives using artificial intelligence & OCR technology
Accenture is using AI to expand the world’s collective memory of the Holocaust and preserve the names and experiences of those who were persecuted.
The horrific events of the Holocaust impacted millions of people during the Second World War and for generations to come. The systematic genocide of people because of their race, ethnicity and political and ideological beliefs must never be forgotten, even as we look toward the future.
That’s why the work of the Arolsen Archives, an international research effort, is so vital. Founded to clarify fates of individuals and look for missing persons, the Archives has spent decades compiling the world’s largest archives related to Nazi persecution with the hopes of not only providing documents to victims and their families, but also of serving as a warning for future generations.
There are more than 110 million digital objects that make up the Archives, a portion of which are part of UNESCO’s Memory of the World, a designation for society’s most significant documents. At a time when the Arolsen Archives was hoping to make its documents universally accessible online, it was facing a timeline of decades—if not longer—to digitize everything.
And yet, the long, manual process of translating, reading, transcribing, cataloging and validating these documents had been an incredibly difficult task. For example, one single document required four people to review it—three crowdsourced volunteers and one member of the Archives—before it could be certified.
Ian Lever, an Accenture employee, began organizing volunteering events through his leadership in Accenture’s Jewish Employee Resource group. The purpose of these events was to bring communities together to preserve the names and stories within the Archives. However, Lever and his colleagues quickly realized how tedious the process was and knew there was a more efficient way of processing the information.
In addition to the innovative crowdsourcing efforts, Lever and his colleagues saw an opportunity to further automate the tedious document-cataloging process using artificial intelligence (AI). Specifically, AI could be leveraged to analyze a wide range of files—from prisoner and death camp transfers to tracing requests and beyond. Lever brought his idea to the Solutions.AI team with the goal of accelerating the document-indexing process, which ushered in the next stage of evolution for this organization.
The Archives was already working on the #everynamecounts initiative that aims to build a digital memorial to the victims of Nazi persecution. However, as Lever and team realized, some documents in the Archives had become too difficult to read due to weathering, illegible entries, inaccuracies—the list goes on. And so, the team turned its attention to two distinct subsets of documents.
The first were original documents from the Nazis themselves—including prisoner lists, transfer lists and concentration camp registrations. The second subset was from the Arolsen Archives (formerly named the International Tracing Service), where for the past 80 years people have submitted inquiries about the locations and fates of family members and loved ones. There are an estimated 2.7 million inquiries in the Arolsen Archives alone.
An AI solution was the ideal tool to index these documents. Bolstered by Accenture’s AI-powered automation solution, a cutting-edge use case was created that leverages cloud-based technologies, optical character recognition (OCR) solutions and the latest AI and machine learning (ML) techniques.
Here’s how it works: The AI solution is shown documents from the Archives, and a “confidence” level is assigned to each field (e.g., last name, religion, region, etc.). The documents that can be read easily are awarded a high level of confidence. Human feedback from these documents is then fed into documents with lower levels of confidence so the AI can better interpret them. The result is a process of continuous innovation where the AI learns from volunteer and historian feedback, improving the accuracy and speed in which documents are preserved.
As word of the project spread at Accenture, it quickly became a multidisciplinary, global push—a feat made possible thanks to Accenture’s history of developing AI and ML solutions for its clients. A community of Accenture volunteers came together and worked with the AI solution as a communal engine, feeding it new inputs and insights in a truly human + machine way.
Accenture employees are enabling the Arolsen Archives to rapidly digitize and preserve the memories of their relatives and others to create a living memorial.
Prior to the new solution, an Arolsen Archives volunteer needed roughly 15 minutes to extract and upload each document. With the new AI-based approach, it takes less than 20 seconds. Since the implementation of the solution, more than 160,000 names have been indexed, more than 18,000 documents have been extracted, and more than 63,000 documents have been clustered, meaning that similar documents have been grouped together for easier and more accurate readings. On average, it takes less than one second per document to cluster, and it will only get faster over time as the AI continues to learn.
Despite AI’s being able to do roughly 95% of the work, there still needs to be a human element in the validation of documents. By bringing humans and machines together, a single volunteer (instead of 10) can now get through roughly 41 documents each hour. Freeing up time for the other volunteers to do the same, Arolsen has seen a 40-fold increase in productivity.
As for the AI’s confidence, it’s steadily rising. For instance, within the field of “mother’s last name,” the AI has gradually improved its confidence by 10% thanks to inputs from volunteers. When it comes to “religion,” the AI is operating at 99% confidence.
There have been other welcome surprises. For instance, despite there only being three documented Holocaust survivors in Ireland, Accenture saw 36 people at one of its volunteering events to update the archives. Today, there are 950+ Accenture volunteers participating across 70+ cities and six continents. This level of willing human participation, bolstered by cloud and AI technologies, will add even more momentum to Arolsen's mission over the long term. The hope is that by learning how people on the edge of society struggled in the past, future generations will be more open to the harsh realities many still face today.
However, the numbers only tell part of the story. They don’t fully capture what it feels like to discover where a long-lost loved one is buried, or to learn that you have a distant cousin who was just born in Belgium. Only a name provides that feeling of connection—and represents the next step on the Arolsen-Accenture journey.
Accenture will continue to work with the Archives to make information more easily accessible and available to the public, keeping the memory of those who died in the Holocaust alive and well for future generations by standing against hate to ensure that their names are never forgotten.
Managing Director – Education, Health and Public Service
Management Consultant – Strategy and Consulting Public Service, North America