While the world is still recovering, research hasn't slowed its frenetic pace, especially in the field of artificial intelligence. Moreover, many important aspects in AI were highlighted this year, like the ethical aspects, important biases, governance, transparency, and much more.
Artificial intelligence and our understanding of the human brain and its link to AI are constantly evolving, showing promising applications improving our life's quality in the near future. Still, we ought to be careful with which technology we choose to apply.
"Science cannot tell us what we ought to do, only what we can do."
- Jean-Paul Sartre, Being and Nothingness
Here are the 10 most interesting computer vision research papers of the year that I shared here on HackerNoon, in case you missed any of them. In short, it is a curated list of the latest breakthroughs in AI and Data Science by release date with a clear video explanation, link to the HN article, and code (if applicable). Enjoy the read!
The full list is also in a GitHub repository if you prefer it this way.
OpenAI successfully trained a network able to generate images from text captions. It is very similar to GPT-3 and Image GPT and produces amazing results.
Code and additional explanation in the HN article!
Tl;DR: They combined the efficiency of GANs and convolutional approaches with the expressivity of transformers to produce a powerful and time-efficient method for semantically-guided high-quality image synthesis.
Code and additional explanation in the HN article!
Will Transformers Replace CNNs in Computer Vision? In less than 5 minutes, you will know how the transformer architecture can be applied to computer vision with a new paper called the Swin Transformer.
Code and additional explanation in the HN article!
"I will openly share everything about deep nets for vision applications, their successes, and the limitations we have to address."
Code and additional explanation in the HN article!
The next step for view synthesis: Perpetual View Generation, where the goal is to take an image to fly into it and explore the landscape!
Code and additional explanation in the HN article!
Properly relight any portrait based on the lighting of the new background you add. Have you ever wanted to change the background of a picture but have it look realistic? If you’ve already tried that, you already know that it isn’t simple. You can’t just take a picture of yourself in your home and change the background for a beach. It just looks bad and not realistic. Anyone will just say “that’s photoshopped” in a second. For movies and professional videos, you need the perfect lighting and artists to reproduce a high-quality image, and that’s super expensive. There’s no way you can do that with your own pictures. Or can you?
Code and additional explanation in the HN article!
This model takes a picture, understands which particles are supposed to be moving, and realistically animates them in an infinite loop while conserving the rest of the picture entirely still creating amazing-looking videos like this one...
Code and additional explanation in the HN article!
Using a modified GAN architecture, they can move objects in the image without affecting the background or the other objects!
Code and additional explanation in the HN article!
TimeLens can understand the movement of the particles in-between the frames of a video to reconstruct what really happened at a speed even our eyes cannot see. In fact, it achieves results that our intelligent phones and no other models could reach before!
Code and additional explanation in the HN article!
Have you ever dreamed of taking the style of a picture, like this cool TikTok drawing style on the left, and applying it to a new picture of your choice? Well, I did, and it has never been easier to do. In fact, you can even achieve that from only text and can try it right now with this new method and their Google Colab notebook available for everyone (see references). Simply take a picture of the style you want to copy, enter the text you want to generate, and this algorithm will generate a new picture out of it! Just look back at the results above, such a big step forward! The results are extremely impressive, especially if you consider that they were made from a single line of text!
Code and additional explanation in the HN article!
The model is called CityNeRF and grows from NeRF, which I previously covered on my channel. NeRF is one of the first models using radiance fields and machine learning to construct 3D models out of images. But NeRF is not that efficient and works for a single scale. Here, CityNeRF is applied to satellite and ground-level images at the same time to produce various 3D model scales for any viewpoint. In simple words, they bring NeRF to city-scale. But how?
Code and additional explanation in the HN article!
We’ve seen AI generate images from other images using GANs. Then, there were models able to generate questionable images using text. In early 2021, DALL-E was published, beating all previous attempts to generate images from text input using CLIP, a model that links images with text as a guide. A very similar task called image captioning may sound really simple but is, in fact, just as complex. It is the ability of a machine to generate a natural description of an image. It’s easy to simply tag the objects you see in the image but it is quite another challenge to understand what’s happening in a single 2-dimensional picture, and this new model does it extremely well...
Code and additional explanation in the HN article!
Please tag me on Twitter @Whats_AI or LinkedIn @Louis (What's AI) Bouchard if you share the list!