Artificial intelligence (AI) recently started making its way into many creative industries, for instance, in the form of tools for digital artists, architects, interior designers and image editors. In these contexts, AI can automate processes that are tedious or time consuming, while also potentially inspiring artists and facilitating their creative process.
Researchers at University of Florence, University of Modena and Reggio Emilia and University of Pisa recently set out to explore the potential of AI models in fashion design. In a paper pre-published on arXiv, they introduced a new computer vision framework that could help fashion designers to visualize their designs, by showing them how they might look on the human body.
Most past studies exploring the use of AI in the fashion industry focused on computational tools that can recommend garments similar to those selected by a user or models that can show online customers how garments would look on their body (i.e., virtual try-on systems). This team of Italian researchers, on the other hand, set out to develop a framework that could support the work of designers, showing them how garments they designed might look in real-life, so that they can find new inspiration, identify potential issues and alter their designs if needed.
“Differently from previous works that mainly focused on the virtual try-on of garments, we propose the task of multimodal conditioned fashion image editing, guiding the generation of human-centric fashion images by following multimodal prompts, such as text, human body poses, and garment sketches,” Alberto Baldrati, Davide Morelli and their colleagues wrote in their paper.
“We tackle this problem by proposing a new architecture based on latent diffusion models, an approach that has not been used before in the fashion domain.”
Instead of using generative adversarial networks (GANs), artificial neural network architectures often used to generate new texts or images, the researchers decided to create a framework based on latent diffusion models or LDMs. As they are trained in a compressed and lower-dimensional latent space, LDMs can create high-quality synthetic images.
While these promising models have been applied to many tasks that require the generation of artificial images or videos, they have rarely been used in the context of fashion image editing. Most previous works in this area introduced GAN-based architectures, which generate lower quality images than LDMs.