Text-to-image synthesis models, such as DALL-E or similar, have proven their ability to convert an input legend into a consistent display.
Expanding the capabilities of these systems, a new project demonstrated that it is possible to process longer narratives, even with metaphorical expressions, to generate more than one consecutive image, as part of the same sequence.
Artificial intelligence to generate consecutive images that are consistent with each other
The researchers of this new project set out to explore an adaptation applicable to a previously trained text-to-image synthesis model, adding to it the ability to execute a new task: develop the continuation of the story presented in the first generated image.
In this task, an initial scene is provided, as a model that the system can follow flexibly as a reference, leaving the generation of these images in the hands of a pretrained and automated system.
The imaging systems known to date have not been trained to perform specialized tasks such as displaying stories. In this case, the capacity added to the AI is based precisely on this factor, understanding the reference order as a narrative succession in which certain characters interact.
To maintain some consistency, the AI-generated visual story is always conditioned to a source image, allowing for better generalization to narratives with new characters.
In this process, the research team took other previously trained text-to-image synthesis models as a starting point, integrating a new approach on top of them that, based on the same variables, can rescue their main elements to mark continuity with the generations. successive.
The analysis shared by the researchers behind this project suggests that the ability to understand narratives involving multiple characters was one of the main challenges to be addressed. In this case, it was a task successfully completed, obtaining good results in the adaptation so that these systems can execute new, complex tasks with low resources.
The software developed within this project is available on GitHub to test it, if you have the necessary experience, on your own platform. Soon, they will have an operational demonstration on the web, but for now they present a screenshot as a preview that illustrates how it will look in the future.