paint-brush
Beeble Researchers Use Physics-Based Models to Achieve Realistic Lighting Effects in Imagesby@autoencoder

Beeble Researchers Use Physics-Based Models to Achieve Realistic Lighting Effects in Images

tldt arrow

Too Long; Didn't Read

Researchers at Beeble AI have developed a method for improving how light and shadows can be applied to human portraits in digital images.
featured image - Beeble Researchers Use Physics-Based Models to Achieve Realistic Lighting Effects in Images
Auto Encoder: How to Ignore the Signal Noise HackerNoon profile picture

Authors:

(1) Hoon Kim, Beeble AI, and contributed equally to this work;

(2) Minje Jang, Beeble AI, and contributed equally to this work;

(3) Wonjun Yoon, Beeble AI, and contributed equally to this work;

(4) Jisoo Lee, Beeble AI, and contributed equally to this work;

(5) Donghyun Na, Beeble AI, and contributed equally to this work;

(6) Sanghyun Woo, New York University, and contributed equally to this work.

Editor's Note: This is Part 2 of 14 of a study introducing a method for improving how light and shadows can be applied to human portraits in digital images. Read the rest below.


Appendix

Human Portrait Relighting is an ill-posed problem due to its under-constrained nature. To tackle this, earlier methods incorporated 3D facial priors [44], exploited image intrinsics [3, 40], or framed the task as a style transfer [43]. Light stage techniques [49] offer a more powerful solution by recording subject’s reflectance fields under varying lighting conditions [10, 14], though they are labor-intensive and require specialized equipment. A promising alternative has emerged with deep learning, utilizing neural networks trained on light stage data. Sun et al. [45] pioneered this approach, but their method had limitations in representing non-Lambertian effects. This was improved upon by Nestmeyer et al. [32], who integrated rendering physics into network design, albeit limited to directional light. Building upon this, Pandey et al. [34] incorporated the Phong reflection model and a high dynamic range (HDR) lighting map [9] into their network, enabling a more accurate representation of global illumination. Simultaneously, efforts have been made to explore portrait relighting without light stage data [20, 21, 42, 47, 55]. Moreover, introduction of NeRF [7] and diffusion-based [38] models has opened new avenues in the field. However, networks trained with lightstage data maintain superior accuracy and realism, thanks to physics-based composited relight image training pairs and precise ground truth image intrinsics [56].


Our work furthers this domain by integrating the Cook-Torrance model into our network design, shifting from the empirical Phong model to a more physics-based approach, thereby enhancing the realism and detail in relit images.


Self-supervised Pre-training has become a standard training scheme in the development of large language models like BERT [11] and GPT [39], and is increasingly influential in vision models, aiming to replicate the ‘BERT moment’. This approach typically involves pre-training on extensive unlabeled data, followed by fine-tuning on specific tasks. While early efforts in vision models focused on simple pretext tasks [13, 17, 33, 36, 53], the field has evolved through stages like contrastive learning [5, 18] and masked image modeling [2, 19, 50]. However, the primary focus has remained on visual recognition, with less attention to other domains. Exceptions include low-level image processing tasks [4, 6, 27, 30] using the vision transformer [15].


Our research takes a different route, focusing on human portrait relighting—a complex challenge of manipulating illumination in the image. This direction is crucial because acquiring accurate ground truth data, especially from light stage, is both expensive and difficult. We modify the MAE framework [19], previously successful in robust image representation learning and developing locality biases [35], to suit the unique requirements of effective relighting.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.