The possibilities of artificial intelligence (AI) never cease to amaze us. In the creative industry, it is being used to automate tedious processes while inspiring artists and facilitating their creative process. Fashion designers are no exception.
Researchers from the University of Florence, the University of Modena and Reggio Emilia, and the University of Pisa have created a new computer vision framework that can help fashion designers visualize their designs, showing them how they would look on the human body.
A new approach to fashion
Unlike other works that focused on virtual fitting of garments, the Italian researchers developed a framework that can support the work of designers, showing them how their garments would look like in real life. Using an approach called “conditional multimodal fashion image editing,” designers can generate fashion images from a variety of stimuli, such as text, sketches, and key points on the human body. The team proposed a new architecture based on latent diffusion models, an approach that had never been used in the fashion world before.
Instead of using generative adversarial networks (GANs), an artificial neural network architecture often used to generate text or images, the researchers decided to create a framework based on latent diffusion models (LDMs). Because these models are trained on a compressed, lower-dimensional latent space, they can create high-quality synthetic images.
Creating new data sets
Most of the existing data sets for training AI models for fashion design tasks only include low-resolution images of garments and do not include the information needed to create fashion images based on text stimuli and sketches. In order to train their model effectively, the researchers had to update these existing data sets or create new ones.
Since the conditional multimodal fashion image editing task was something new, the researchers created two new datasets: Dress Code and VITON-HD, and extended them with semiautomatically collected multimodal annotations. The experimental results on these new data sets demonstrate the effectiveness of their proposal, both in terms of realism and coherence with the provided multimodal stimuli.
The results
In early evaluations, the model created by the research team achieved very promising results, creating realistic images of clothing on human bodies inspired by sketches and specific text stimuli. The source code for their model and the multimodal annotations they added to the datasets are already on GitHub.
This new model could be integrated into existing or new software tools for fashion designers. It could also inform the development of other BOM-based AI architectures for creative real-world applications. Surely it will not take long to see software of this type on the market.