Depth estimation and stereo image super-resolution are well-known tasks in the field of computer vision. To help researchers get high-quality training data for these tasks, industry-leading lightfield hardware provider Leia Inc. used their social media app, Holopix™, to create Holopix50k, the world’s largest “in-the-wild” stereo image dataset.
To learn more about the dataset, we spoke with Puneet Kohli, one of the lead computer vision researchers involved in Holopix50k’s development.
Kohli has worked for many large companies in the past, such as Amazon and Barclays. He led the Computer Vision Engineering and R&D efforts at Leia Inc., developing core algorithms and tools for understanding 3D scenes.
Holopix50k is a large-scale in-the-wild stereo image dataset that contains 49,368 image pairs. These image pairs were all contributed by users of the Holopix™ mobile platform. Leia Inc. is using their computer vision research to enhance the experience of viewing content on their Leia powered lightfield devices such as phones, tablets, and automotive displays.
“The most exciting use cases of Holopix50x are for stereo computer vision tasks like self-supervised depth estimation and stereo super-resolution,” says Kohli. Super resolution is simply the upscaling of lower resolution images into a higher resolution, without updating or changing the hardware of your device.
Researchers are exploring neural networks that can estimate how a low resolution image would look in a higher resolution and display that estimation in real-time. The goal in the end would be improved graphical rendering on various devices, with lower GPU costs. “These are well-known tasks in the computer vision community, but to-date there has been no academic dataset as large as Holopix50k focusing on in-the-wild scenarios,” says Kohli. “Holopix50kwill enable researchers to train deep learning models that are trained on mobile photography and the diverse scenarios found in real-life.”
PASSRNet [56] trained on Holopix50k Rendering input from the KITTI dataset
As you can see from the image above, networks trained on Holopix50k are able to upscale low-res images strikingly close to the ground truth image data. Aside from lightfield devices, this technology can also be used for improving graphics in virtual reality and mixed reality headsets. In the security sector, it can be used to provide HD video surveillance at a larger scale with lower hardware costs.
Super resolution can improve virtually any device that displays images or videos for entertainment, educational, or business purposes.
Holopix is Leia Inc.’s flagship social network mobile app which is similar to other image sharing apps like Instagram but is focused on lightfield content, as opposed to ‘flat’ images.
Holopix is similar to Instagram, in the sense that users can see posts in a feed-view from other users, and upload their own images. The difference is that users upload multi-view images which can have two or more views, and are converted into a lightfield image, using Leia’s proprietary computer vision technology.
Lightfields (or light fields) add a new layer of immersion to images and videos by capturing and displaying the same image from many angles. It is a new visual medium based on an old idea called integral photography by Gabriel Lippman in 1908.
Image via Cubicle Ninjas
Lightfields give images an illusion of depth, and have various applications in AR, VR, and MR technologies.
“Lightfields transform existing device displays with lighting effects, texture and 3D depth,” says Kohli. “It creates a richer experience, making content more beautiful and engaging. When viewed on a Leia device, lightfield images on Holopix stand out with depth and an immersive head-tilt based parallax effect.”
Pictures from Holopix are also available on the web, viewed as a Parallax animation. Below are some examples of lightfield images hosted on the Holopix website:
“In a laboratory setting, there is a limited amount of data you can collect,” says Kohli. “For example, if we wanted to create an image dataset of water bottles, we would have to first import and collect as many different types of water bottles we can and take photos of them in our lab”.
Mosaic of images from the Holopix Platform
“Although we might have collected a large amount of water bottle photos, I may not have covered the large diversity of water bottles available due to a variety of reasons. MaybeI we are not able to procure them past what’s available in our nearby stores, or perhaps there are many custom bottles that I do not have access to. Furthermore, I may not have the time, budget, or physical storage resources to extensively collect all these water bottles! Of course, this is just one example, but the general problem is quite similar.”
"We are constrained by our own capabilities."
“On the contrary, if we went for an ‘in-the-wild’ approach, where we collected images of water bottles from the internet, or social media, we might be able to collect a larger amount of images, with more unique types of bottles. Of course, we still need to put in the effort involved in collecting and curating such a dataset, but given the scale we can reach through such crowd-sourced means, we could say confidently that we’d have a much larger set of images.”
Examples of diverse content in Holopix50k
“What we are most proud of is the sheer size of our dataset, which is five times larger than the 2nd largest comparable dataset,” says Kohli. “Going into a bit more detail, our HD split which contains images at 720p (0.92Mpx resolution) contains 36k images, and is not only the largest HD dataset, but also has the highest resolution after the Middleburry benchmark dataset.”
“What surprised us the most is that Holopix50k was quite consistent in the results across various metrics, and performed relatively well despite the huge size. For example, our complete dataset has the highest score in SR Metric. Our SD set, which contains only the 360p images has an even higher score in this metric.”
“In-house we have already built some neural networks based off of the Holopix50k dataset which are shown in our paper,” Kohli explains. “These networks are currently used in various projects at Leia, including Holopix™. We also built a pipeline for GPU-Accelerated Mobile Multiview Style Transfer which used some of the Holopix50k data for qualitative results.”
The above image displays disparity maps produced by Monodepth2 models on samples from the Middlebury (Left) and MPI Sintel (Right) datasets, respectively. The (a) column shows results of the model trained on KITTI, whereas the (b) column shows results of the model trained on Holopix50k.
“Given that we’ve released Holopix50k for the research community to use and build models with, we are excited to see new neural networks trained using the Holopix50k dataset for a variety of computer vision tasks,” says Kohli. “We envision that it will be used to fine-tune existing methods. Furthermore, we hope it will be used to train novel methods to improve the generalization of networks to ‘in-the-wild’ scenarios.”
The team at Leia also plans to update the dataset in coming years with additional data they collect from the Holopix platform.
The full Holopix50k stereo image dataset can be downloaded from Github. Please note that the dataset is only available for non-commercial research purposes. The full license can be found on the project page.
Previously published on: https://lionbridge.ai/articles/holopix50k-a-new-benchmark-for-stereo-image-super-resolution-and-depth-estimation/