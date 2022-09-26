An artist found from her file in the LAION image database, which is used to train AI models. Much is puzzling.

An AI artist from California discovered that private images from her 2013 medical record are being used in an AI training set by LAION. Text-to-image generators such as Stable Diffusion and DALL-E train with such data. She had uploaded a picture of herself to the page created for this purpose and found pictures there that her doctor at the time had stored in her medical file in 2013; the pictures were not linked to her name. After he died in 2018, they found their way onto the internet without their prior consent. At that time she had signed that no one else may see the pictures. However, a link from LAION’s “multimodal” data collection – an open, multimodal dataset used to train AI for research purposes – pointed to the image. LAION datasets are to be understood as indexes for the Internet: they list the URLs to the original images together with the associated ALT texts.

Researcher Zack Marshall comments that he and his colleagues have already done research on such cases. “In more than 70 percent of the case reports published in medical journals, at least one image from the paper can be found in Google Images.” Marshall suspects her doctor probably published the picture in a medical journal. Lapine told the US media Ars Technica that the picture was probably stolen from the files of her deceased doctor and then ended up on the Internet. Ars Technica had found other similar patient images in the data set.

Legal situation still unclear

Since LAION does not have the images itself, but only refers to them, the operators of the linked pages are legally responsible. It is unclear which law applies here. The employees are located in different countries. In Germany, there is probably an obligation to remove links to illegal content once those responsible have become aware of it.