The Biases in Fitness Trackersby@TheMarkup
236 reads

The Biases in Fitness Trackers

by The MarkupJuly 22nd, 2021
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

If you have darker skin, shuffle when you walk, or regularly push a stroller, it can throw off the data. Commercial fitness trackers are being used for all kinds of things other than tracking steps. In the lab, when subjects jogged or walked normally, the devices undercounted steps 50 percent of the time, compared with observed steps. If you’re an employer wanting to promote step-counting as fitness, know that all people’s steps are not counted equally equally.

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - The Biases in Fitness Trackers
The Markup HackerNoon profile picture

If you have darker skin, shuffle when you walk, or regularly push a stroller, it can throw off the data

In a 2019 study, 18 senior citizens took a stroll on some treadmills while armed to the hilt with fitness trackers. They had devices strapped to their wrists and ankles, fastened to their belts, and wrapped around their chests. But even with all these trackers, the seniors couldn’t get an accurate step count because their movements were too slow to trigger the sensors in the devices. 

Commercial fitness trackers are being used for all kinds of things other than tracking steps. They measure heart rate, track sleep patterns, and calculate basal metabolic rate and calories burned. They’re used in clinical trials, research labs, and by insurance companies and corporate wellness programs. 

But are they really reliable enough?

There are various ways your fitness tracker could go wrong, especially if you don’t fit into a fairly narrow demographic: light skin tone, in your 20s or 30s, with an average fitness level and “purposeful” gait. (We’ll get to that in a minute.)

Step Counters: It’s Not What They Know, It’s What They Assume

Devices don’t specifically count each step; they approximate using what’s called an “accelerometer.” Accelerometers use electromagnetic sensors to pick up on motion, and the fitness trackers interpret that information using an algorithm that trains the devices to recognize what counts as a step. Users can further personalize those calculations by programming the device with their height, weight, and age. 

However, the algorithms used are usually based on data from studies that enroll college-age men, said Lynne Feehan, an associate professor in the Department of Physical Therapy at the University of British Columbia who is a co-author of a study on Fitbit accuracy. “They do detect steps well if it’s normal paced steps, normal cadence,” Feehan said. “They were designed to measure purposeful walking.” But, she said, if you’re shuffling around in the kitchen taking small steps, or if you’re pushing a stroller or a walker, the accelerometer isn’t as reliable.

“There’s a definite bias in there,” she said. “How a child moves and how someone who’s 90 moves is very different.” 

In an emailed statement Shelten Yuen, vice president of research at Fitbit, said the company works continuously to improve its algorithms. “Fitbit uses AI and machine learning, coupled with insights from its large database of biometric information to develop and continually improve its offerings,” he wrote. 

If you take a step, for instance, and thrust your arm forward, the accelerometer will sense the force of that movement and record a step. But if you shuffle, limp, don’t swing your arms while walking, or just plain move slowly, the step may not register on your fitness tracker.

50: Percent of the time that fitness trackers undercounted steps in one laboratory study.

The same is true for older adults who move more slowly, those who walk with a limp, or those who have Parkinson’s disease, a symptom of which is reduced arm motion while walking. Feehan and her co-authors found that Fitbits underestimated steps in older adults by 25 percent compared to other, research-grade devices. 

Feehan’s study also found that the devices counted steps with an acceptable accuracy overall only about a third to one half the time. In the lab, when subjects jogged or walked normally, the devices undercounted steps 50 percent of the time, compared with observed steps. Outside of lab conditions, however, the Fitbits would overestimate steps by as much as 35 percent compared to research-grade pedometers and accelerometers.

Fitbit declined to comment on individual studies.

The devices are also fairly easy to fool, should you want to, say, beef up your step count for workplace-sponsored fitness campaigns. Before joining The Markup, investigative reporter Surya Mattu collaborated with engineer and artist Tega Brain to devise a series of strategies for tricking these trackers, from attaching them to metronomes to spinning them around on drills or bicycle wheels.

So, if you want to compare your activity levels day by day, a step counter can be a good tool. If you’re aiming for those mythical 10,000 steps a day, your count might be off. If you’re an employer wanting to promote step-counting as fitness, know that comparisons among individuals may not be fair, as all people’s steps are not counted equally. 

Heart Rate: Skin Tone and Exercise Intensity May Matter

To measure heart rate, most trackers use a technique called photoplethysmography, which measures blood volume by shining a beam of green LED light into the wrist. When the heart beats, more blood flows into the blood vessels and more of the green light is absorbed by that blood. Between beats, the blood ebbs away, absorbing less light. From these measurements, the device then calculates heart rate. 

But the green LED sensors that track heart rate can be unreliable. Green light has to penetrate the skin in order to measure blood volume, but several studies suggest that green light is more likely to be absorbed by more melanated skin. 

The science on this is still being debated. One study didn’t find a correlation between skin tone and accuracy, though it did report that the error rate during activity was 30 percent higher than at rest. Another study found that heart rate monitors on wrist-worn fitness trackers, especially the Apple Watch, perform pretty well in controlled environments in studies

Fitbit says the company has worked hard to calibrate the sensors in their devices to work for everyone. “To achieve the optimum, most consistent performance for users of all skin tones, we designed our optical system to emit green light at sufficient strength to sense through darker skin and our detector to be sensitive enough to accurately detect the heart rate signal,” Yuen said in an emailed statement. 

But users with dark skin who have tried Fitbits and other trackers have complained that the devices give strange readings or don’t work at all. Smartwatch maker Polar even lists dark skin and tattoos as factors that can limit the accuracy of wrist-worn monitors.

Mikael Mattsson, a senior researcher at Karolinska Institutet, a medical university in Sweden, said that in research settings, scientists usually calibrate seven different wavelengths of light to yield the most accurate readings, “but you can’t fit everything in a small watch,” he explains.

The Apple Watch uses infrared sensors in addition to green LEDs to measure resting heart rate. Red light is more accurate and reliable than green, said Mattsson, but it’s also easily thrown off by movements, so many devices still rely on green light for heart rate measurements during exercise. The Apple Watch Series 4 also features electrodes that measure the electrical current from the heart directly, instead of relying on proxy measures like blood levels.

The accuracy of wrist-worn sensors varies a lot depending on the type of activity.

Even so, the accuracy of wrist-worn sensors varies a lot depending on the type of activity. While the devices are pretty good during repetitive, stable, moderately intense activities like riding a stationary bike, studies show they can get heart rate wrong even during other relatively controlled activities, like using an elliptical machine with arm levers, and that no wrist-worn sensors are as accurate as chest-strap monitors

“The more variation within our exercise, the bigger the difference,” said Mattsson. He hasn’t even attempted to test the devices outdoors yet because, in his view, “If they’re not good enough indoors, they won’t be good enough outdoors.”

All in all, some heart rate monitors can work pretty well under specific conditions and are fine if you’re tracking heart rate for fun or to casually compare the intensity of your workouts—just know that they’re not perfect. And if you’re a competitive athlete or need to monitor your heart rate for health reasons, you may be better off with a more sophisticated device like a chest strap. 

Calories: A Most Elusive Metric

Here’s where things get really murky. From those potentially inaccurate measurements about movement and heart rate, most devices then use proprietary algorithms to calculate energy expenditure. Some trackers allow you to add information about your height, weight, age, and sex, which the device uses to calculate basal metabolic rate—basically how many calories your body burns each day in its normal functions.  

But how many calories you burn during an activity is the most unreliable metric that fitness trackers calculate. 

Mattsson is a co-author of one of the few studies that look at how well fitness trackers work on a diverse group of people of different ages, weights, heights, skin tones, and fitness levels. His work found that, across the board, no devices had an error rate less than 20 percent for calculating calories burned. “The engineers create algorithms for machines. But humans are not machines,” he said.

Another recent study tested four trackers—the Apple Watch Series 4, Polar Vantage V, Garmin Fenix 5, and Fitbit Versa—and concluded that while the Apple Watch and the Polar Vantage V did pretty well at measuring heart rate, none of the four should be used to monitor energy expenditure at the levels tested, which ranged from sitting to sprinting.

As with step counting, Mattsson said the problem lies in the algorithm. “Since they are using an algorithm and proxy measures, it’s never going to be perfect,” he said. “The biggest problem is that they’ve done the algorithms for a subset of people. In most studies you talk about white males in their 30s at average fitness level. The farther away you get, the bigger the risk of a problem.”

Mattsson said the situation lands fitness trackers in an awkward Catch-22. To improve, companies need a more diverse group of people to buy and use their product. That way, companies would have a more diverse set of data to train the algorithms on and and get algorithms that take into account the full diversity of bodies, fitness levels, and ages. “But then you need a lot of people to use them even if they’re not perfect,” he said.

Bottom line? If you’re looking to calorie count, don’t rely on your fitness tracker. Overall, trackers can be fun and useful, but the data they provide has to be taken in context. “It’s more important for everybody to recognize that there’s fallibility to every technology and take that into account when you use the technology,” said Helena Mentis, director of the Bodies in Motion Lab at the University of Maryland, Baltimore County. 

Fitness trackers can help users learn about themselves, recognize patterns, and reflect on their behavior, but the data these devices provide shouldn’t be taken as gospel truth. And considering the devices’ uncertain accuracy across demographics, anyone considering using commercial fitness trackers to monitor health or compare individuals’ performances should think twice.

Originally published as "How Accurate Is Your Commercial FitnessTracker?" with the Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license.