Recently I was given a Myo armband, and this article aims to describe how such a device could be exploited to control a robotic manipulator intuitively. That is, we will move our arm as if it was the actual hand of the robot. The implementation relies on the Robot Operating System (
The Myo armband is an electronic bracelet produced by Thalmic Labs, which allows the owner to interact remotely with other devices. Have a look at the
The device is equipped with an accelerometer, a gyroscope, and a magnetometer packed together inside a chip called
But the Myo it’s likely most known for its 8-channels EMG sensor, giving it the ability to sense the muscular activity of the arm of who’s wearing it. One can then process this kind of data to recognize the current gesture of the hand of the user. Machine learning is a popular choice here. For instance, people built applications allowing you to switch channels on your smart TV by waiving in and out your hand or turning on your smart light by snapping your fingers.
Our goal is to control a robot using the Myo. So it seems like we need a robot, which is quite expensive nowadays :(
Fortunately, there exist simulators, and indeed we’re gonna control a robot in a simulated environment using
Now that we somehow have a robot, we need a way to talk with it. For this purpose, there exists the Robot Operating System (or just ROS), probably the de facto standard when it comes to open-source robotics projects. In particular, my code is based on rospy, which, as you might guess, is a python package allowing you to write code to interact with ROS. There are tons of libraries to work with the Franka with ROS, and you can find good working ones for the Myo armband as well. In particular, I’ve used
The manipulator moves accordingly to signals computed by a resolved-velocity controller. I don’t want to dive deep into robotics theory, but the choice was primarily driven by the fact that it doesn’t require the desired task space position as input. The user should drive the robot to his target pose by moving his own arm, intuitively.
This controller will impose the orientation of the Myo (i. e. of our hand) to the manipulator, whereas hand extension and flexion, when detected, trigger a linear velocity of +0.1m/s and -0.1m/s respectively over the approach direction of the end-effector (i. e. the Franka gripper). Naturally, whenever the hand is steady in a neutral pose, the linear velocity imposed on the end-effector will be zero.
In the above picture, you can see a block marked as 1/s. That’s the inner control loop of the manipulator, which directly talks with the robot joints and makes them move. It is provided by the Franka API and it’s compliant with the
There are several methods to describe the orientation of a body in space with respect to a given reference frame. For instance, a popular choice is to rely on the
Given that the Myo has a built-in magnetometer, it can already give us its current orientation. Unfortunately, I found a big issue with that. Namely, it’s not clear to which reference frame the provided orientation refers to. Even worst, while recording multiple orientations from the same Myo pose, the results were sometimes different. It could be an issue with the firmware or the magnetometer itself because this device can greatly suffer from the nearby presence of other electrical devices. The point is that I couldn’t trust the orientation of the Myo this way.
To address the problem of reliable orientation, we exploit the
Once we have a reliable source of orientation, we need to describe it wrt to the base link of our manipulator. Here, we assume that we know the orientation of the base link frame wrt to the Earth's global frame. Since we are working in a simulation, we are free to arbitrarily choose that the global frame in the virtual world coincides with the one of the Earth. In particular, given that our gazebo world also has the base link frame aligned with the world frame, we can directly say that Madgwick’s filter gives us the orientation of the Myo related to the base link of the Franka. Exactly what we need.
Because we aim to control the manipulator as if it was our arm, it’s just easy to think that our hand acts as the gripper. It follows that our forearm should be aligned with the approach direction (z-axis) of the Franka end-effector. The accelerometer and gyroscope of the Myo are oriented differently, as they have the z-axis orthogonal to our forearm. So we need a change of frame to account for that, as depicted below.
Then, we have to realize that both the gyroscope and the accelerometer are very noisy sensors. To this end, I find it beneficial to filter their data using the Stance Hypothesis Optimal Estimation¹ (SHOE) detector, before feeding them to Madgwick’s algorithm. This is a simple formula that computes a score out of angular velocity and linear acceleration readings, weighted by their variances. We say that the stance hypothesis is confirmed (i. e. the device is stationary) or rejected (i. e. the device is moving) whenever the score is below or above a given threshold. The idea is to feed data to Madgwick’s filter only if we detect that the device (our arm) is moving. This prevents the algorithm to keep computing useless micro-changes in orientation due to subconscious moves or vibrations of our arm.
In the above video, we are just logging to screen the angular velocity and acceleration of the Myo. It’s not important to read the values. What matters is that, as you can see, on the left side of the screen data is changing continuously, even when my arm is not moving significantly. On the contrary, the right side of the screen shows the same data filtered with the SHOE detector. This time values get updated much more coherently with the true motions of my arm.
Quick recap: we want to exploit the EMG sensor of the Myo for gesture recognition. We need 2 different gestures to impose a +0.1m/s or -0.1m/s velocities along the z-axis of the end-effector. Additionally, we want to make the gripper close when we squeeze our hand into a fist gesture. Finally, we need to recognize whenever our hand is in a neutral pose in order not to send any of the previous commands to the robot.
This project uses the scikit-learn implementation of a
Of course, any machine learning algorithm requires a proper dataset to train on. There are some even good-quality datasets available online for gesture recognition using the Myo armband. Unfortunately, most of them assume EMG readings in the form of an 8-bit unsigned integer (aka a byte). My device provides data as 16-bit signed integers. It may be due to a different firmware version or the ROS package used to interact with the bracelet. Anyway, I didn’t achieve satisfactory results training on those datasets.
I ended up collecting my own small dataset by asking a bunch of friends and relatives to perform hand gestures while wearing the Myo. The protocol was very simple: for each candidate, I recorded 2 sessions for each of the three target gestures (fist, flexion and extension). A single session lasted a minute, during which the candidate was asked to alternate between the neutral and the target gesture every 5 seconds. In the meanwhile, every EMG reading coming from the Myo was recorded. The final dataset was then chunked into samples in a sliding window fashion. A single sample comprises 30 consecutive 8-channels EMG readings. Two subsequent samples have an overlap of 10 readings. This choice should enable approximately 2 gesture predictions per second.
Once I had the data, knowing the extent (60 seconds) of a session, the target gesture performed in every session, the span of every gesture (5 seconds), and the frequency of the EMG sensor (about 50 hertz), I could annotate every sample with the proper gesture label. This approach is theoretically optimal, but in reality, it can’t be. The EMG is not always working at 50 Hz, and a candidate is a human being rather than a perfect machine, so sometimes a gesture lasts more than 5 seconds, sometimes less. So you are going to accumulate errors while annotating samples this way within a recording session. It turned out that a better solution was to annotate data by using a clustering algorithm, in particular, I chose the popular
The dataset has been uploaded and is free to access on Kaggle. There, you will find a quick notebook on which you can test the performance of an SVM on the data annotated with both the labels “by hand” and the labels provided by the K-means. The test runs a 5-fold
Now I have to stress one thing: what I’ve done here, that is using a clustering algorithm to annotate data for supervised learning, cannot be done most time. Machine learning would be a lot easier otherwise. The point is that cluster labels provided by the K-means have no semantical meaning at all! For instance, you know that sample x belongs to cluster “3", but actually, you don’t know which class is “3”. Fortunately, in this project, given that we have only 4 possible labels and the classifier works pretty well, it was straightforward to experimentally find the associations from the cluster labels to the related gestures. But in general, this cannot be done. Think if you had 1000 different clusters and you have to match 1000 different class labels…
The following is an attempt to visualize the entire dataset. The 2 reddish circles represent class labels and you may see that there are indeed differences between those provided by me (inner) and those given by the K-means (outer). The bluish circle is EMG readings: the more it gets close to yellow, the more powerful it was the recorded muscular activity.
Here we find only a subset of the dataset. This time you may better recognize that the bluish circle is chunked into many circular sectors, everyone being a single sample. Notice also the orthogonal subdivision into 8 different parts. That’s because the Myo EMG has 8 different channels.
As we previously said, a single sample entails 30 consecutive EMG readings. If you are familiar with
There’s one last point to cover in our project. Recall that the Franka is a redundant manipulator. It means that it can reach the desired pose of the end-effector in potentially more than one configuration. That’s a desirable property in general, but it doesn’t come for free. There are indeed some joint configurations that make further motion very unease for our manipulator. Those are called
The resolved-velocity control implemented in this project comprises two different operative modes. The first one is the simplest and basically just ignores singularities. With this setup, the Franka probably will soon get stuck into one of these bad configurations and we will need a little patience to get it out.
The second mode implements the revised resolved-velocity control loop presented in A purely-reactive manipulability-maximising motion controller². Shortly, it aims to maximize the measurement of manipulability (MoM). You can think about it as a score of how much easy the manipulator can move away from its current joint configuration. Thus, if we maximize this quantity during our control loop, we keep the Franka far from its singularities. The below images may help you visualize what a kinematics singularity is.
To drive the robot in those poses, I made the very same motion with my arm, but it resulted in two different behaviors. On the left, we find the Franka in one of its singular configurations, with the 6th joint vertically aligned with the base link. On the right, the singularity has been avoided thanks to the advanced control loop and the robot is ready to move to any other configuration.
Of course, this project is far from perfection and by no means ready to be used in any real-world environment. It is supposed to be a proof of concept for further developments, although I confess that even now it’s quite funny to use the Myo to control the Franka :) One of the most critical points regards the difficulty to execute all the computations required to keep our code up and running.
To bring in some context, the nominal frequency of the Franka manipulator is 1 Khz. By downgrading and disabling some of the Gazebo physical engine features, I’m able to run the program at around 100–200 Hz. That’s while using the simple control loop, without singularity avoidance. Introducing the advanced resolved-velocity control, the frequency drops to the nearby of 5 Hz. That’s 200 times less than the frequency the robot is supposed to work at.
The algorithm involves solving every time a quadratic programming problem and my machine is not able to handle it in real-time. I made some attempts to improve this, really with no significant gain. One idea could be to employ the simple controller by default, resorting to the advanced one as we get close to a singularity. More broadly, I think switching from python to C++ could make a huge difference. Naturally, if you have a powerful machine you may already get an enjoyable experience without even noticing these problems.
Anyway, if you come so far, let me thank you. I’m aware that I’ve just reported a brief description of the project, without discussing in detail neither the code or the theory behind it. The purpose was just to elicit your curiosity and give you an idea of what can be achieved with some robotics and data science background. If you have time, will, and a Myo armband at hand, feel free to extend this project!
[1] I. Skog, P. Handel, J. -O. Nilsson and J. Rantakokko, “
[2] Haviland, J. and Corke, P., 2020.
This post was originally published here.