AI-based, real-time multilingual translations available

The Yomiuri Shimbun

A decade ago, communicating with people from other countries without worrying about language differences was difficult to imagine. This has now changed thanks to improving artificial intelligence-based translation technologies. While overseas companies led by U.S. firm Google are ahead in development, Japanese companies are catching up.

Highly accurate free app

“Saikin no Ukruraina josei o taihen shinpai shiteimasu.”

“I am very concerned about the recent situation in Ukraine.”

I spoke the former voice message in Japanese to VoiceTra, a speech translation application on my smartphone. In about three seconds, I received the latter voice message in English.

VoiceTra was developed by the National Institute of Information and Communications Technology (NICT), which is under the supervision of the Internal Affairs and Communications Ministry.

When you speak easy-to-understand Japanese clearly in a relatively quiet environment, the system can accurately perform “consecutive interpretation,” a sentence by sentence translation of a language. The system can be used free of charge together with text translation software TexTra.

As Japan is receiving more and more foreign visitors and workers, NICT has been developing multilingual translation technologies in earnest since 2014. Its accuracy has rapidly increased since 2017 when AI was introduced into the system.

Google’s innovation

Technology for machine translation emerged in the 1950s. During the U.S.-Soviet Cold War, the United States conducted research to translate Russian into English. However, the accuracy of the translations was poor even after making the machine learn grammar, and the project failed. Around the 1990s, a technology called “statistical machine translation” emerged. The system tries to find the most statistically probable translation based on a sequence of words, but its uses were limited.

Google made a breakthrough in 2016 by announcing the introduction of an AI-based technology called “neural machine translation” (NMT). The technology uses artificial neural networks that mimic the human brain and nerve cells to select necessary words from a large amount of data and compose them appropriately. Until then, translations between Japanese and other languages that have different grammatical structures felt unnatural. The NMT solved the problem and dramatically improved the quality of translations. The system continues to be improved day by day through accumulating data.

An initial NMT model scored more than 900 out of 990 in the TOEIC English test, a level that shows an appropriate communication level, said Prof. Masaru Yamada at Rikkyo University, a specialist of translation studies. He added, “Now, it has reached a level too high for TOEIC to evaluate.”

Focus on Asian languages

To differ from products by Google and other foreign companies, NICT’s VoiceTra and TexTra put special emphasis on languages used in Asian countries and specialized fields such as finance and patents. They support 31 languages, including Khmer, Nepali, and Mongolian.

The government aims to achieve “simultaneous interpretation” by the Osaka-Kansai Expo in 2025. The technology enables real-time interpretation of speech based on contexts, a step up from sentence-by-sentence consecutive interpretation.

Kiyotaka Uchimoto, director of the Universal Communication Research Institute at NICT, says that the current ultimate goal is to realize simultaneous interpretation between multiple languages for business conferences.

There are still many challenges even with consecutive interpretation. A ministry official said that to improve accuracy in simultaneous interpretation, it is necessary to develop technologies that can infer subjects often omitted in Japanese sentences and anticipate contexts of speech.

Pocketalk, a best-selling translation device in Japan by Pocketalk Corp. in Tokyo, combines translation engines by NICT, Google, and other companies with each other to support 82 languages. The product is increasingly used in the medical field in addition to travel and language learning.

Earphones, glasses

Portable translation devices are the most common on the market. Pocketalk, for example, is a palm-sized terminal about 10 centimeters long and six centimeters wide. The size of such devices can be reduced as technology improves. Google and Chinese information technology companies have also been developing and releasing earphone-type and glasses-type “wearable” translators. Consumers’ options are expanding and convenience is increasing.

The market of machine translation is expected to grow, intensifying development competition.

“Understanding someone who speaks a different language … can be a real challenge. Let’s see what happens when we take our advancements in translation and transcription and deliver them in your line of sight,” Google CEO Sundar Pichai told the audience at an event in May when introducing a prototype of a glasses-type translator.

Tobishima Corp., a construction company in Tokyo, developed a glasses-type translator with a display screen for one eye, and has already put it into use at construction sites. The company said that the device has proved very helpful in communicating with foreign employees who do not understand Japanese well.

A Tobishima employee said, “The device can translate technical terms in the construction field, too. In addition, as translations are displayed on the screen, there is no problem even when it is used in a noisy environment.”