Being a passionate follower of technology, I have been captivated by numerous technological advancements. However, the one that has particularly piqued my interest is the assistive technology used by Dr. Stephen Hawking during advanced stages of ALS (Amyotrophic Lateral Sclerosis), enabling him to communicate despite his inability to speak. Driven by my inquisitive nature, I delved into researching the underlying technology and only recently have I been motivated enough to undertake the task of developing a very simple prototype that emulates this technology.
Through a combination of twitching his cheek muscle and blinking his eyes, Dr. Hawking utilized a scanning method to select characters on a screen. Despite its slow pace, this technology proved to be a crucial tool in facilitating his communication and sharing his invaluable ideas and knowledge with the world.
While Dr. Hawking’s system was notably more intricate, incorporating an adaptive word predictor following the selection of each letter, the prototype we intend to develop will operate on a simpler premise. Specifically, it will detect eye blinks and utilize them as inputs to select letters displayed on a screen. Subsequently, we can readily expand upon this by integrating external APIs to facilitate next word prediction, as well as incorporating a text-to-speech feature. You can read more about intel’s technology that Dr. Hawking used here.
If you possess even a modicum of interest in facial recognition or object detection, it is highly likely that you are familiar with OpenCV. Having previously gained some experience working with OpenCV on a project pertaining to lane detection and vehicle collision prevention, I began to delve more deeply into how I could leverage its capabilities for my current undertaking.
I perused numerous pre-existing APIs, such as Kairos and AWS image recognition services, among others. A comprehensive list of these APIs can be found here.
Despite encountering some options that failed to provide the desired functionality or proved cumbersome to implement, my aim was to identify a user-friendly and easily extendable solution.
During the initial stages of my research, I came across
During the course of my research, I frequently encountered Python as a potential solution. Given my previous experience working with OpenCV in C#, I was intrigued by the prospect of exploring Python. Ultimately, I determined that Python OpenCV with Dlib held particular promise, which I read about in greater depth on
While I initially experienced some challenges installing the requisite dependencies on my Mac, I eventually succeeded in setting up my development environment and commenced coding. Rapidly, I was able to create functional code for detecting eye blinks, albeit with certain limitations. Specifically, the blink detection was not especially efficient, and constructing the UI in Python proved cumbersome. Nonetheless, the solution was more efficient and expeditious than face-api.js, and did not result in any instances of my computer hanging or crashing.
However, an additional constraint arose with Python, as constructing the user interface proved to be a rather challenging task. As I continued my research, I stumbled upon an intriguing alternative in
NOTE: Official Mediapipe documentation is moving to this link starting April 3, 2023.
In light of the aforementioned limitations with Python, I found myself compelled to revert back to Javascript. Furthermore, the engaging and accessible demos available through Mediapipe were another factor that drew me towards using JS. With some experimentation and tinkering in JS, I became convinced that this was the ideal solution for my project.
Although I initially intended to construct a React Native app utilizing the Mediapipe face-detection, time constraints compelled me to opt for a vanilla JS project instead. Nevertheless, adapting this code to a React or Vue JS app should be a relatively straightforward process for anyone interested in doing so. Additionally, it is worth noting that by the time you are reading this article, I may have already developed a React app.
Mediapipe has a significant advantage over other alternatives such as face-api.js and Python OpenCV, as it can estimate 468 3D face landmarks in real-time, compared to only 68 landmarks in the model used with Python OpenCV and face-api.js. This advantage is significant since it allows for more efficient facial expression detection.
Facial landmarks are crucial features of a human face that enable us to differentiate between different faces. The following are the facial landmarks that I utilized with Python and face-api.js:
Here are the ones used by MediaPipe:
Here is a list of all the landmark points:
And here is an example of a real face with the facemesh:
It is evident that the Mediapipe landmark model is far superior to the other models, and I could effortlessly notice the difference in detection accuracy. You can read more about landmarks here.
Out of the box, the sample code gives you a face mesh superimposed on your face when you look at the camera. But any facial expression detection has to be done by you. There were 2 approaches I could follow to detect a blink:
Watch for the Iris to disappear when the eye is closed
Detect when the upper eyelid meets the lower eyelid.
I decided to go with the second option. Observe the landmark points on the right eye below. When the eye is closed, the distance between the vertical landmark points 27 and 23 decreases significantly (almost to 0), while the distance between the horizontal points 130 and 243 remains nearly constant.
Here is the function for blink detection:
function blinkRatio(landmarks) {
// RIGHT EYE
// horizontal line
let rh_right = landmarks[33]
let rh_left = landmarks[133]
// vertical line
let rv_top = landmarks[159]
let rv_bottom = landmarks[145]
// LEFT EYE
// horizontal line
let lh_right = landmarks[362]
let lh_left = landmarks[263]
// vertical line
let lv_top = landmarks[386]
let lv_bottom = landmarks[264]
// Finding Distance Right Eye
let rhDistance = euclideanDistance(rh_right, rh_left)
let rvDistance = euclideanDistance(rv_top, rv_bottom)
// Finding Distance Left Eye
let lvDistance = euclideanDistance(lv_top, lv_bottom)
let lhDistance = euclideanDistance(lh_right, lh_left)
// Finding ratio of LEFT and Right Eyes
let reRatio = rhDistance / rvDistance
let leRatio = lhDistance / lvDistance
let ratio = (reRatio + leRatio) / 2
return ratio
}
We check both eyes because we want both eyes closed for a blink. Another way to utilize eye detection is by identifying winks, which can be used as a different user input. However, detecting a wink can be challenging since when you close one eye, the other eye also slightly narrows its vertical height. Although I attempted to detect winks using the 68 landmark model when working with Python, the results were hit and miss. With the MediaPipe model having more landmarks on the eye, I anticipate better results for wink detection.
Returning to the topic of blinks, I became aware that utilizing a natural blink as a user input was resulting in unintended inputs. Consequently, I had to devise an alternative approach. My solution involved registering an input only if the user kept their eyes closed for a slightly longer duration than a regular blink. Specifically, I classified a blink as a user input only if the eyes remained closed for five successive video frames.
let CLOSED_EYES_FRAME = 5
Varying the lengths of eye closure allowed for additional user options such as the ability to delete an input or restart the prompts.
let BACKSPACE_EYES_FRAME = 20
let RESTART_EYES_FRAME = 40
Below is a code snippet that detects a user input and distinguishes it from input, deletion, or restart.
if (finalratio > CLOSED_EYE_RATIO) {
CEF_COUNTER += 1
} else {
// debugger
if (CEF_COUNTER > CLOSED_EYES_FRAME) {
TOTAL_BLINKS += 1
blinkDetected = true
document.getElementById('blinks').innerHTML = TOTAL_BLINKS
document.getElementById('ratio').innerHTML = finalratio
}
if (CEF_COUNTER > RESTART_EYES_FRAME) {
console.log('restart frames detected')
restart()
} else if (CEF_COUNTER > BACKSPACE_EYES_FRAME) {
console.log('backspace frames detected')
backspace()
restart()
}
CEF_COUNTER = 0
}
Project link: https://github.com/karamvirs/face-talk
Lead image source: Tumisu Pixabay