
The Cultural Affairs Agency in Kamigyo Ward, Kyoto.
20:00 JST, April 9, 2026
The Cultural Affairs Agency will provide Japanese text data needed to train Large Language Models (LLM), the fundamental technology of generative artificial intelligence.
Amid the rapid spread of LLM worldwide, the agency will support the development of highly accurate domestic AI by providing reliable linguistic data.
LLM train on massive text datasets and predict the most likely next word in a sequence to generate human-like texts, make summaries or answer questions. Wrong or biased data can harm the quality of LLM.
The agency will use a database of written language operated by the National Institute for Japanese Language and Linguistics (NINJAL). The number of words in the database will be increased from the current 100 million words to 200 million by fiscal 2028.
The agency will then establish a system to provide the data in stages to domestic generative AI operators.
Texts in the database are picked statistically from books, textbooks and internet message boards, among other materials, to serve as a microcosm of modern Japanese. These texts have been checked by NINJAL staff and are said to be free from copyright issues.
The Cultural Affairs Agency deems it important for domestic operators to develop LLM from the perspective of international competitiveness and hopes that “reliable language resources” will improve the accuracy of the technology.
The agency also will create a database of spoken and written dialects, as well as standard Japanese translations, to promote the development of voice recognition AI technology specializing in dialects. Such a database could facilitate smooth communications with elderly people when they are receiving medical care or during the recovery process from disasters, as older people often speak in a dialect.
Top Articles in Society
-
Earthquake Hits Japan’s Tohoku Region; 3-meter Tsunami Warning Issued (Update 1)
-
Police Find Child’s Shoe During Search for Missing Boy in Nantan, Kyoto Prefecture
-
Body Found in Nantan, Kyoto Prefecture, During Search for 11-Year-Old Boy in Area (Update 1)
-
Cherry Blossoms, Rapeseed Flowers Perform Colorful ‘Duet’ in Niigata
-
Two Women in Osaka Found Lying on Floor Bleeding, Later Pronounced Dead
JN ACCESS RANKING
-
Earthquake Hits Japan’s Tohoku Region; 3-meter Tsunami Warning Issued (Update 1)
-
Police Find Child’s Shoe During Search for Missing Boy in Nantan, Kyoto Prefecture
-
Body Found in Nantan, Kyoto Prefecture, During Search for 11-Year-Old Boy in Area (Update 1)
-
Cherry Blossoms, Rapeseed Flowers Perform Colorful ‘Duet’ in Niigata
-
Trump Extends the Ceasefire with Iran but Keeps the Blockade
Most read in the last 24 hours
-
China, South Korea Object to Japanese PM Takaichi's Ritual Offeri...
-
Trump Opposes United–American Merger, Signals Support for Spirit
-
Japan's ANA to Introduce Fuel Surcharges to Domestic Flights from...
-
Taiwan President Cancels Africa Trip Blaming Chinese Pressure
-
Wisteria Flowers Glisten in Sunlight at Shrine in Japan's Yamaguc...
Most read in the last 7 days
-
Earthquake Hits Japan's Tohoku Region; 3-meter Tsunami Warning Is...
-
Trump Extends the Ceasefire with Iran but Keeps the Blockade
-
Olympic Gold Medal-Winning Figure Skaters Riku-Ryu Announce Retir...
-
China, South Korea Object to Japanese PM Takaichi's Ritual Offeri...
-
Japan to Ban Use of Portable Chargers on Airplanes from April 24,...
Most read in the last 30 days
-
Earthquake Hits Japan's Tohoku Region; 3-meter Tsunami Warning Is...
-
Police Find Child's Shoe During Search for Missing Boy in Nantan,...
-
Body Found in Nantan, Kyoto Prefecture, During Search for 11-Year...
-
Cherry Blossoms, Rapeseed Flowers Perform Colorful ‘Duet’ in Niig...
-
Trump Extends the Ceasefire with Iran but Keeps the Blockade

