Head shots of Victims Included in Data for Generative AI Training without Permission

The Yomiuri Shimbun

Many head shots of victims of incidents and disasters have been included in data used to train image generative artificial intelligence, The Yomiuri Shimbun has found.

Such images are believed to have been used without permission after being collected from news websites and other online sites. As the possibility of AI tools generating images similar to those of the victims cannot be ruled out, this practice is likely to spark debate about its pros and cons.

Outcry from bereaved families

“I can’t believe my daughter’s photo has been used for such things,” said Yukimi Takahashi, 61, whose daughter, Matsuri, then a 24-year-old new employee of advertising giant Dentsu Inc., committed suicide due to overwork in 2015.

After Matsuri’s death, Takahashi provided a face photo of her daughter to news organizations while also posting it on social media. She did so to raise public awareness of the reality of overwork, hoping to prevent similar incidents from happening again.

However, the photo was included in a dataset used by Stable Diffusion, one of the most popular AI models in the world.

The inclusion of such photos has been found after The Yomiuri Shimbun examined the dataset, which was released online, last December.

“I want [such photos] not to be used for irrelevant AI training,” Takahashi said.

Quake victims also included

Generative AI creates elaborate images that are indistinguishable from illustrations or photographed images simply by giving it instructions. Vast amounts of data are required for AI training to improve the precision of the images.

According to Stability AI Ltd., the British startup that developed Stable Diffusion, the AI model uses data provided by German nonprofit organization LAION.

The dataset contains about 5.8 billion images. Besides the photo of Matsuri, many other images of victims of incidents and accidents have been found. The data also contains photos of the children who were victimized by a serial killer in Kobe in 1997 as well as those of four victims of the so-called Setagaya family murder incident in Tokyo in 2000. There were also images of victims of disasters, such as the 2011 Great East Japan Earthquake, and the Sept. 11, 2001, terrorist attack in the United States.

The images on the dataset have been collected via a program that automatically goes through the internet and collects data from news sites and online bulletin boards, on which victim images have been taken from the news sites, among other sites.

Response is ‘insufficient’

There have been cases in which news reports carry head shots of victims in order to convey the reality of incidents and disasters.

Datasets used to train image generative AI contain images collected mechanically and indiscriminately, regardless of the content of the images. Given that, unauthorized AI learning of illustrations and other copyrighted materials has been also considered problematic.

The Yomiuri Shimbun previously found sexual images of real children in the dataset. The images could amount to violations of the law to ban child prostitution and child pornography.

According to experts, AI could possibly generate images similar to those it has learned. So it is possible that images could be generated that defame the victims, or that the images may be misused to disseminate false information.

In response to an email interview, Stability AI said that there is a mechanism to exclude certain data from the subject for AI training if requested. However, the company did not respond to the question about whether it was aware that the dataset contained head shots of victims.

“From the point of view of the bereaved families who have made public the victims’ head shots to stress the lessons from the incidents and disasters, it is unexpected that such images have been used for AI learning, and this is a matter of the dignity of the deceased,” said Akiko Orita, a professor of information sociology at Kanto Gakuin University who specializes in digital data of the dead.

“It is different from news reporting, which is in the public interest. So it’s not sufficient for AI developers to respond to the matter simply by saying, ‘If there is a request, we will exclude the image,’” she added.

On the other hand, apart from this issue of unauthorized AI training, demand for AI creations of deceased family members may possibly grow in the future.

“With the use of AI spreading, it is necessary for society as a whole to discuss how to protect the feelings of bereaved families and respect the dignity of the dead,” Orita said.