Deep Learning for Modeling Audio-Visual Correspondencesby@kraken
327 reads
327 reads

Deep Learning for Modeling Audio-Visual Correspondences

by Rishab Sharma20mSeptember 16th, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Humans can ‘hear faces’ and ‘see voices’ by cultivating a mental picture or an acoustic memory of the person. The natural synchronization between sound and vision can provide a rich self-supervisory signal for grounding auditory signals into the visual signals. Inspired by our capability of interpreting sound sources from how objects move visually, we can create learning-models that learn to interpret this interpretation on its own. We will use a simple architecture that will rely on static visual information to learn the cross-modal context. The motion signals are of crucial importance for learning the audio-visual correspondences.

Companies Mentioned

Mention Thumbnail
Mention Thumbnail

Coins Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Deep Learning for Modeling Audio-Visual Correspondences
Rishab Sharma HackerNoon profile picture
Rishab Sharma

Rishab Sharma

@kraken

Data Scientist and Visual Computing Researcher

About @kraken
LEARN MORE ABOUT @KRAKEN'S
EXPERTISE AND PLACE ON THE INTERNET.
L O A D I N G
. . . comments & more!

About Author

Rishab Sharma HackerNoon profile picture
Rishab Sharma@kraken
Data Scientist and Visual Computing Researcher

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite