paint-brush
Why Are Vision Transformers Focusing on Boring Backgrounds?by@mikeyoung44
1,451 reads
1,451 reads

Why Are Vision Transformers Focusing on Boring Backgrounds?

by Mike Young5mOctober 2nd, 2023
Read on Terminal Reader
Read this story w/o Javascript

Too Long; Didn't Read

Vision Transformers (ViTs) have gained popularity for image-related tasks but exhibit strange behavior: focusing on unimportant background patches instead of the main subjects in images. Researchers found that a small fraction of patch tokens with abnormally high L2 norms cause these spikes in attention. They hypothesize that ViTs recycle low-information patches to store global image information, leading to this behavior. To fix it, they propose adding "register" tokens to provide dedicated storage, resulting in smoother attention maps, better performance, and improved object discovery abilities. This study highlights the need for ongoing research into model artifacts to advance transformer capabilities.
featured image - Why Are Vision Transformers Focusing on Boring Backgrounds?
Mike Young HackerNoon profile picture
Mike Young

Mike Young

@mikeyoung44

Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

Learn More
LEARN MORE ABOUT @MIKEYOUNG44'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Original Reporting

Original Reporting

This story contains new, firsthand information uncovered by the writer.

L O A D I N G
. . . comments & more!

About Author

Mike Young HackerNoon profile picture
Mike Young@mikeyoung44
Among other things, launching AIModels.fyi ... Find the right AI model for your project - https://aimodels.fyi

TOPICS

Languages

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite