How Parallel-UNet Transforms Virtual Try-On with Implicit Warping and Unified Operations

Written by backpropagation | Published 2024/10/06
Tech Story Tags: ai-in-fashion | deep-learning | tryondiffusion | parallel-unet | photorealistic-fashion | fashion-technology | body-pose-adaptation | image-based-virtual-try-on

TLDRThe 128×128 Parallel-UNet employs implicit warping using a cross attention mechanism to manage complex transformations like garment warping. By combining warping and blending into a single pass, the architecture utilizes two UNets to efficiently process person and garment images. Pose embeddings guide these operations, enhancing the correspondence between the target person and the garment.via the TL;DR App

Authors:

(1) Luyang Zhu, University of Washington and Google Research, and work done while the author was an intern at Google;

(2) Dawei Yang, Google Research;

(3) Tyler Zhu, Google Research;

(4) Fitsum Reda, Google Research;

(5) William Chan, Google Research;

(6) Chitwan Saharia, Google Research;

(7) Mohammad Norouzi, Google Research;

(8) Ira Kemelmacher-Shlizerman, University of Washington and Google Research.

Table of Links

Abstract and 1. Introduction

2. Related Work

3. Method

3.1. Cascaded Diffusion Models for Try-On

3.2. Parallel-UNet

4. Experiments

5. Summary and Future Work and References

Appendix

A. Implementation Details

B. Additional Results

3.2. Parallel-UNet

The 128×128 Parallel-UNet can be represented as

Combining warp and blend in a single pass. Instead of warping the garment to the target body and then blending with the target person as done by prior works, we combine the two operations into a single pass. As shown in Fig. 2, we achieve it via two UNets that handle the garment and the person respectively.

This paper is available on arxiv under CC BY-NC-ND 4.0 DEED license.


Written by backpropagation | Uncovering hidden patterns with backpropagation, a powerful but often misunderstood algorithm shaping AI insights.
Published by HackerNoon on 2024/10/06