Authors:
(1) Hoon Kim, Beeble AI, and contributed equally to this work;
(2) Minje Jang, Beeble AI, and contributed equally to this work;
(3) Wonjun Yoon, Beeble AI, and contributed equally to this work;
(4) Jisoo Lee, Beeble AI, and contributed equally to this work;
(5) Donghyun Na, Beeble AI, and contributed equally to this work;
(6) Sanghyun Woo, New York University, and contributed equally to this work. Editor's Note: This is Part 7 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below. Table of Links Abstract and 1. Introduction
2. Related Work
3. SwitchLight and 3.1. Preliminaries
3.2. Problem Formulation
3.3. Architecture
3.4. Objectives
4. Multi-Masked Autoencoder Pre-training
5. Data
6. Experiments
7. Conclusion Appendix A. Implementation Details
B. User Study Interface
C. Video Demonstration
D. Additional Qualitative Results & References 4. Multi-Masked Autoencoder Pre-training We introduce the Multi-Masked Autoencoder (MMAE), a self-supervised pre-training framework designed to enhance feature representations in relighting models. It aims to improve output quality without relying on additional, costly light stage data. Building upon the MAE framework [19], MMAE capitalizes on the inherent learning of crucial image features like structure, color, and texture, which are essential for relighting tasks. However, adapting MAE to our specific needs poses several non-trivial challenges. Firstly, MAE is primarily designed for vision transformers [15], while our focus is on a UNet, a convolution-based architecture. This convolutional structure, with its hierarchical nature and aggressive pooling, is known to simplify the MAE task, necessitating careful adaptation [50]. Further, the hyperparameters of MAE, particularly the fixed mask size and ratio, are also specific to vision transformers. These factors could introduce biases during training and hinder the model to recognize image features at various scales. Moreover, MAE relies solely on masked region reconstruction loss, limiting the model to understand the global coherence of the reconstructed region in relation to its visible context. To address these challenges effectively, we have developed two key strategies within the MMAE framework: Dynamic Masking. MMAE eliminates two key hyperparameters, mask size and ratio, by introducing a variety of mask types to generalize the MAE. These types, which include overlapping patches of various sizes, outpainting masks [46], and free-form masks [29] (see Fig.5), each contribute to the model’s versatility. With the ability to handle challenging masked regions, MMAE achieves a more comprehensive understanding of image properties. Generative Target. In addition to its structural advancements, MMAE incorporates a new loss function strategy. We have adopted perceptual [24] and adversarial losses [22], along with the original reconstruction loss. As a result, MMAE is equipped not only to reconstruct missing image parts but also to ensure synthesis capabilities and their seamless integration with the original context. In practice, the weights for the three losses are equally set. We pre-train the entire UNet architecture using MMAE, and, unlike MAE, we retain the decoder and fine-tune the entire model on relighting ground truths. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. Authors: (1) Hoon Kim, Beeble AI, and contributed equally to this work; (2) Minje Jang, Beeble AI, and contributed equally to this work; (3) Wonjun Yoon, Beeble AI, and contributed equally to this work; (4) Jisoo Lee, Beeble AI, and contributed equally to this work; (5) Donghyun Na, Beeble AI, and contributed equally to this work; (6) Sanghyun Woo, New York University, and contributed equally to this work. Authors: Authors: (1) Hoon Kim, Beeble AI, and contributed equally to this work; (2) Minje Jang, Beeble AI, and contributed equally to this work; (3) Wonjun Yoon, Beeble AI, and contributed equally to this work; (4) Jisoo Lee, Beeble AI, and contributed equally to this work; (5) Donghyun Na, Beeble AI, and contributed equally to this work; (6) Sanghyun Woo, New York University, and contributed equally to this work. Editor's Note: This is Part 7 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below. Editor's Note: This is Part 7 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below. Editor's Note: This is Part 7 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below. Editor's Note: This is Part 7 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below. Table of Links Abstract and 1. Introduction 2. Related Work 3. SwitchLight and 3.1. Preliminaries 3.2. Problem Formulation 3.3. Architecture 3.4. Objectives 4. Multi-Masked Autoencoder Pre-training 5. Data 6. Experiments 7. Conclusion Abstract and 1. Introduction Abstract and 1. Introduction 2. Related Work 2. Related Work 3. SwitchLight and 3.1. Preliminaries 3. SwitchLight and 3.1. Preliminaries 3.2. Problem Formulation 3.2. Problem Formulation 3.3. Architecture 3.3. Architecture 3.4. Objectives 3.4. Objectives 4. Multi-Masked Autoencoder Pre-training 4. Multi-Masked Autoencoder Pre-training 5. Data 5. Data 6. Experiments 6. Experiments 7. Conclusion 7. Conclusion Appendix A. Implementation Details B. User Study Interface C. Video Demonstration D. Additional Qualitative Results & References A. Implementation Details A. Implementation Details B. User Study Interface B. User Study Interface C. Video Demonstration C. Video Demonstration D. Additional Qualitative Results & References D. Additional Qualitative Results & References 4. Multi-Masked Autoencoder Pre-training We introduce the Multi-Masked Autoencoder (MMAE), a self-supervised pre-training framework designed to enhance feature representations in relighting models. It aims to improve output quality without relying on additional, costly light stage data. Building upon the MAE framework [19], MMAE capitalizes on the inherent learning of crucial image features like structure, color, and texture, which are essential for relighting tasks. However, adapting MAE to our specific needs poses several non-trivial challenges. Firstly, MAE is primarily designed for vision transformers [15], while our focus is on a UNet, a convolution-based architecture. This convolutional structure, with its hierarchical nature and aggressive pooling, is known to simplify the MAE task, necessitating careful adaptation [50]. Further, the hyperparameters of MAE, particularly the fixed mask size and ratio, are also specific to vision transformers. These factors could introduce biases during training and hinder the model to recognize image features at various scales. Moreover, MAE relies solely on masked region reconstruction loss, limiting the model to understand the global coherence of the reconstructed region in relation to its visible context. To address these challenges effectively, we have developed two key strategies within the MMAE framework: Dynamic Masking. MMAE eliminates two key hyperparameters, mask size and ratio, by introducing a variety of mask types to generalize the MAE. These types, which include overlapping patches of various sizes, outpainting masks [46], and free-form masks [29] (see Fig.5), each contribute to the model’s versatility. With the ability to handle challenging masked regions, MMAE achieves a more comprehensive understanding of image properties. Dynamic Masking. Generative Target. In addition to its structural advancements, MMAE incorporates a new loss function strategy. We have adopted perceptual [24] and adversarial losses [22], along with the original reconstruction loss. As a result, MMAE is equipped not only to reconstruct missing image parts but also to ensure synthesis capabilities and their seamless integration with the original context. In practice, the weights for the three losses are equally set. Generative Target. We pre-train the entire UNet architecture using MMAE, and, unlike MAE, we retain the decoder and fine-tune the entire model on relighting ground truths. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

AI Learns to Perfect Lighting in Photos Using Smart Masks and Creative Training

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

12 Key Aspects for Assessing the Power of Text-to-Image Models

Beeble Researchers Use Physics-Based Models to Achieve Realistic Lighting Effects in Images

See the Stunning Details This AI Preserves in Relit Photos

New Framework by Beeble Researchers Promises to Bring Realistic Glow to Digital Portraits Using AI

New AI Relighting Model Outperforms Previous Models

AI System Uses Teamwork to Create Picture-Perfect Lighting

12 Key Aspects for Assessing the Power of Text-to-Image Models

Beeble Researchers Use Physics-Based Models to Achieve Realistic Lighting Effects in Images

See the Stunning Details This AI Preserves in Relit Photos

New Framework by Beeble Researchers Promises to Bring Realistic Glow to Digital Portraits Using AI

New AI Relighting Model Outperforms Previous Models

AI System Uses Teamwork to Create Picture-Perfect Lighting

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps