When visibility is low at night and you turn on your smartphone camera, the video preview is full of darkness, and the visibility is even lower than what you can see.
With the surging of real-time video apps, we have seen various video enhancement technologies (such as beautification or AR stickers) that make a video look better than it is. You may wonder if there is a technology that can make video look clearer than it is in low light conditions.
The answer is surely a YES. And there are a few more scenarios where there are strong demands for low-light image enhancement technology as follows.
There are some nighttime scenarios where the video quality and visibility are critical, such as surveillance cameras or car security cameras.
Surveillance cameras are set up at public spots to monitor and record. However, nighttime video recordings are mostly dark and unclear in low light conditions, and cannot provide clear and strong support as clues or evidence in criminal cases.
Video recordings of car security cameras are crucial for traffic accidences. You would wish your nighttime video recordings were clear enough to convince a traffic policeman.
When it comes to internet entertainment and social scenarios, you would wish to see the other participant's faces clearly in the video even if it is dark.
Let me tell you a true client story of ours. It is a sizable internet dating platform, and it has a user who doesn't like to turn the lights on in her bedroom. As a result, her cyber pals cannot see her face when it is very dark and her skin is dark too. The client respects the user's preferences but wanted to improve the user experience.
We have an in-house AI team who are dedicated to video effects and avatar technology. Both the AI-powered technologies require face recognition abilities based on video inputs, and the output effects rely heavily on the quality of video input.
It is challenging to recognize faces and body outlines when the video pictures are dark and blurry. As a result, the video effects and avatar fidelity might be compromised.
To resolve the issues, we need to brighten low-light videos through intelligent algorithms.
The primary requirement is to enhance the lightness of pixels that are recognized as low-light ones. Meanwhile, there are other requirements as follows:
a) High-light pixels shouldn't be over-brightened; b) Image noises shouldn't be magnified(similarly in voice noise suppression); c) Valid information should be retained to the maximum.
Traditional algorithms brighten all the pixels regardless of conditions.
Deep machine learning can be used for low-light image enhancement too. But It will bring some overheads:
a) The framework and models are pretty big; b) The computational volumes are large; c) The power consumptions are significant.
If we brighten all pixels regardless of conditions, some high-light pixels would become dazzling, and image noise would be amplified as well.
Deep machine learning models cannot run on inferior mobile devices or IoT devices because of their overheads. And as a result, only a small proportion of devices can satisfy the requirements, and the application scenarios are very limited.
To cope with the issues, we took a couple of measurements as follows:
a) Not all video pixels should be brightened since some are already light enough. We will scan through a video frame by frame, and find pixels of different categories: Some video pixels are fairly light and need no brightening.
Some video pixels are of low light and need brightening to different volumes accordingly. We avoid over-brighten pixels that are already light enough.
b) We don't load deep machine learning frameworks and models fully. Instead, we load the 2D/3D LUK (a.k.a. Look Up Table) only. We train deep machine learning models offline. Once the models are ready to work, we extract the 2D/3D-LUTs and implant them into our algorithm. As a result, we can achieve the same effect of using a deep machine learning model without the burden of fully loading it.
a) FHD Real-time Video The solution supports FHD quality video communication in real-time because of the deep machine learning 2D/3D-LUT. FHD quality video refers to FHD video with a high frame rate, and it lays the foundation for a superior user experience.
b) Comfortable and Natural Visual Experience We believe in the philosophy that less is more. We do different levels of brightening accordingly and avoid over-brightening to make the video look natural and comfortable. We don't even take any action if we detect that a video picture is light enough.
c) Full Device Coverage, including the low-ends Thanks to the offline deep machine learning approach and other innovative measures, the solution demonstrates outstanding performance even on low-end or old edition smartphones such as Mi 2S or iPhone 4S, and IoT devices.
d) Robustness in Extreme Lighting Conditions
The solution has been tested with a large number of extreme lighting cases and has been proven to be able to work well under extreme dark or bright conditions. We have also conducted a comparison test of other competition solutions and found that in extreme lighting conditions there were exceptions such as dazzling or flashing.
ZEGOCLOUD has an AI team that is dedicated to video enhancements. They have built a set of sophisticated AI algorithms systematically. The team has concluded a list of points that they have applied in the algorithms to achieve the advantages.
Deep machine learning is certainly an advanced technology to enhance low-light videos. However, its overheads will increase latency and slow down processing speed. Low-end smartphones cannot even be able to run the model.
We extract the 2D/3D LUT for runtime table lookup purposes. The overhead of this lookup table is very small, and the approach has demonstrated great performance.
The algorithm will scan video pictures, and evaluate the lightness levels of different areas. We brighten the areas that are low-light but contain valid information and leave those areas that are already light enough.
You may say that it is dark in the picture, and you can see nothing. This is because the light of valid objects is so low that you just cannot see them with your naked eyes. However, the information about the valid objects is still there.
We increase the light of different areas to appropriate levels accordingly, and you could see these objects with your naked eye comfortably and naturally.
We don't want to make the change very quickly. You will see a flashing video if we make a sharp brightening. We will do the brightening in a few frames bit by bit. then you will see that the light level is increasing gradually. It makes you feel natural and comfortable too. The philosophy behind all these endeavors is people-oriented.
In actual situations, light could be flashing, and the light level between two consecutive frames could be very different. We don't want to change light levels, nor do we even want to detect the inter-frame changes. These are actual situations, and what we can do is be adaptive.
Detecting the changes is heavy computational work, which will slow down processing speed and introduce latency. Also, detecting algorithms is complicated and its judgment could be wrong pretty often.
Given the constraints, we decided that we should avoid detecting inter-frame light changes, and shouldn't do any light changes accordingly as well.
Image noise is certainly an issue. There are various methods to handle image noise in the market. After thorough experiments and comparisons, we conclude a few principles:
a) Restrain from amplifying the light level of image noise while we are increasing light levels of pixels carrying valid information. We have constructed a sophisticated algorithm to detect image noise and cope with them differently.
b) Try not to do an image noise suppression job because it is a heavy computational task and will slow down processing speed as well. In addition, it is very likely that the result is not satisfactory.
c) Take some conservative measures to suppress image noise if you have to. For example, to suppress a pixel of image noise, you can use the average value of the light level of its surrounding pixels to replace its value.
If there is a low-light situation in your application scenarios, and you want to improve user experience, please contact us and ask our experts for specific advice.
Also published here.