Generative models, particularly GANs, have demonstrated the ability to encode meaningful visual concepts linearly within their latent space, allowing for controlled image edits, such as altering facial attributes like age or gender. However, multi-step generative models like diffusion models must still identify this linear latent space. Recent personalization methods, such as Dreambooth and Custom Diffusion, suggest a potential direction for finding such an interpretable latent space. These methods personalize diffusion models by fine-tuning specific subject images, leading to identity-specific model weights rather than relying on a latent code within the noise space.
Researchers from UC Berkeley, Snap Inc., and Stanford University explore the weight space of customized diffusion models by creating a dataset of over 60,000 models, each fine-tuned to represent different visual identities. They term this weight space “weights2weights” (w2w) and model it as a subspace. By analyzing this space, they demonstrate its utility for sampling new identities, making semantic edits (like adding a beard), and inverting images to generate realistic identities, even from out-of-distribution inputs. Their findings suggest that this w2w space is an interpretable latent space for identities, enabling various creative applications.
Image-based generative models like VAEs, Flow-based models, GANs, and Diffusion models have been widely used for creating high-quality, photorealistic images. GANs and Diffusion models are particularly noted for their controllability and customization abilities. Research has focused on fine-tuning these models to incorporate user-defined concepts, often by reducing the dimensionality of parameters through techniques like low-rank updates, operating within specific layers, or using hypernetworks. The latent space of GANs, especially the StyleGAN series, has been extensively studied for its editing capabilities, while recent efforts are exploring similar latent spaces within diffusion models. Additionally, studies have examined the structure of weight spaces in deep networks for model ensembling, editing, and other applications.
The method begins by creating a manifold of model weights to represent individual identities. This is done by fine-tuning latent diffusion models using Dreambooth and reducing the dimensionality of the resulting weights through LoRA. The fine-tuned weights form a dataset projected into a lower-dimensional space, termed w2w. Linear directions within this space are identified to correspond to semantic attributes, allowing for identity editing. Additionally, this manifold is used to constrain the inversion of a single image into its identity by optimizing weights within the w2w space, ensuring realistic identity reconstruction.
The experiments demonstrate the utility of the w2w space for manipulating human identities across several tasks. Using fine-tuning techniques, a synthetic dataset of ~65,000 identities was generated and encoded into model weights. These weights were used to sample new identities, edit identity attributes, and invert out-of-distribution identities into realistic ones—the w2w space allowed consistent and disentangled edits, preserving identity better than baseline methods. The study also found that increasing the number of models in the w2w space improves the disentanglement of attributes and the preservation of identities.
The study introduces the concept of w2w space, where diffusion model weights are treated as points in a space defined by other customized models. This space enables applications like sampling, editing, and inversion of model weights rather than images, focusing on human identities. While acknowledging the potential misuse for malicious identity manipulation, the authors hope the framework will be used to explore visual creativity and enhance model safety. They also suggest that w2w space could be generalized to other concepts beyond identities, which will be explored in future research. The space acts as an interpretable latent space for identity manipulation.
Check out the Paper, Model, and GitHub. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. If you like our work, you will love our newsletter..
Don’t Forget to join our 47k+ ML SubReddit
Find Upcoming AI Webinars here
Sana Hassan, a consulting intern at Marktechpost and dual-degree student at IIT Madras, is passionate about applying technology and AI to address real-world challenges. With a keen interest in solving practical problems, he brings a fresh perspective to the intersection of AI and real-life solutions.