I recently explored Facebook's new, insanely realistic chatbot. They've outdone themselves with PIFuHD, which uses a 2D image to re-construct a high-res 3D model. This is state-of-the-art, as previous algorithms couldn't capture details like fingers, facial features, and clothing folds.
Now, this can all be done from a single, smartphone-quality photo.
PIFuHD stands for Pixel-Aligned Implicit Function for High-Resolution 3D Human Digitization.
That's a mouthful.
To simplify it, PIFuHD works in two steps:
The paper offers this graphic showing the two steps:
In the top half, we can see the downsampling take place, in order to gain a "global 3D structure," or a coarse representation, while the bottom half shows the fine level adding high-res detail.
So far, you might just be thinking "that's neat," but there are many important applications for this technology.
"Human digitization" is key to many applications in medical imaging and virtual reality. While a clothed, full-body 3D model is just the start, this technology could be expanded to enable 3D MRIs, CTs, or ultrasonic images.
As one paper explains:
"Anatomical models are important training and teaching tools in the clinical environment and are routinely used in medical imaging research."
While an anatomical model can be used in education and research, the paper also explains the usefulness of 3D models of ribs, liver, and lung.
3D imaging in general is huge for healthcare. As Dr. Frank Rybicki describes at HealthTech Magazine:
“Modern radiology is completely dependent on 3D visualization.”
The possibilities are endless!
Easy-to-use, open AI models with publicly-available code are a great way to make the field more accessible to everyone.
Hopefully, PIFuHD will eventually be wrapped into a no-code solution, so artists, designers, and healthcare professionals can create models in their own software.
I've explored in-depth how no-code is the future of AI, and how no-code analytics tools like Apteo benefit the industry as a whole. As it stands, for PIFuHD to be implemented in the real-world, by everyone, it would need to be further simplified and abstracted.
I'm using a Chromebook, with only a low-powered, integrated GPU, so there's no chance I'd be able to run the code on my own system, which is fine!
You can spin up their Google Colab demo to get a model in moments, as I did. First, turn on GPU acceleration (Edit > Notebook settings > Hardware Accelerator > Select "GPU"). Then, run the cells (SHIFT + ENTER), uploading your own PNG or JPEG file as requested.
The results won't always be perfect (e.g. check out my floating foot in the example up top), so they offer a few guidelines to improve your results, including using a front-facing, high-res image of a single person, with decent lighting and at a normal height.