AI Turns Obscure Handwriting from Japan’s Wartime Documents into Readable Text

The Yomiuri Shimbun
Naoki Kanno, chief of the Center for Military History, explains a copy of a letter written by Isoroku Yamamoto, commander-in-chief of the navy’s Combined Fleet during World War II, in April, in Shinjuku Ward, Tokyo.

The National Institute for Defense Studies (NIDS) has decided to convert its vast collection of Japanese military records into text data with the help of artificial intelligence, after which the records will be made available online.

Many prewar and wartime records are handwritten in cursive, often requiring an expert to decipher. Once the documents are transcribed, it should be possible for anyone to easily trace the movements of individual units during the war and see how decisions were made. The project could contribute to new historical discoveries.

In a letter addressed to senior naval officials after the attack on Pearl Harbor in December 1941, Isoroku Yamamoto, commander-in-chief of the navy’s Combined Fleet, expressed his frustration over the “mood of victory” prevailing at the time.

The letter reads: “It seems that the United States is finally ready to launch a serious operation against Japan, and the frivolity at home is truly degrading. If things continue on this way, I fear that a single strike on Tokyo will instantly cower them.”

The NIDS’ Center for Military History in Tokyo’s Ichigaya district holds about 100,000 historical documents related to the Japan’s former Imperial military. Some of these have been digitized as images and made available on the center’s website, where they can be searched by document title. However, the content of the documents has not been transcribed, preventing users from doing keyword searches. Moreover, the cursive style of the texts presents a challenge to the average reader.

In the transcription project, the NIDS will use a technology called AI-OCR (optical character recognition). OCR can recognize text in documents that have been made into image files, and can transcribe this text. This technology will be paired with AI that has been trained to read the cursive characters.

AI-OCR will be fed sample documents, and any errors in the output will be corrected by humans. This learning process will be repeated until the accuracy improves, at which point the institute will begin transcribing the entire collection.

The Defense Ministry has allocated ¥70 million in its initial budget for fiscal 2025, the first year of the project, and will contract out the project work.

The NIDS is aiming for over 90% accuracy, and the data used for machine learning will eventually be made public, contributing to the advancement of AI technology.

Once the documents are transcribed and made available online, people will be able to easily search documents using keywords, such as gyokusai (heroic death), and will no longer have to struggle to read indecipherable handwriting.

It will also be possible to search all the documents at once, increasing the odds that researchers will uncover new historical facts or new methods of analysis.

Many people visit the NIDS to research their relatives’ wartime experiences.

“Transcription has been a long-standing goal, but doing it manually would have required an astronomical amount of time,” said Naoki Kanno, chief of the Center for Military History. “We will create an environment where people can access documents that allow us to reflect on the war.”