paint-brush
Towards an ImageNet Moment for Speech-to-Text: A Deep Diveby@snakers41
174 reads

Towards an ImageNet Moment for Speech-to-Text: A Deep Dive

by Alexander Veysov9mAugust 23rd, 2020
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

The ImageNet moment in a given ML sub-field arrives when the architectures and model building blocks required to solve 95% of standard “useful” tasks are widely available as standard and tested open-source framework modules are available. The models are available with pre-trained weights; the compute required to train models for everyday tasks is minimal (e.g. 1–10 GPU days in STT) compared to the compute requirements previously reported in papers. We have chosen the following stack of technologies: acoustic modellingFeed neural networks for acoustic modelling (mostly grouped 1D convolutions with squeeze and excitation blocks)

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Towards an ImageNet Moment for Speech-to-Text: A Deep Dive
Alexander Veysov HackerNoon profile picture
Alexander Veysov

Alexander Veysov

@snakers41

Data Scientist

About @snakers41
LEARN MORE ABOUT @SNAKERS41'S
EXPERTISE AND PLACE ON THE INTERNET.
L O A D I N G
. . . comments & more!

About Author

Alexander Veysov HackerNoon profile picture
Alexander Veysov@snakers41
Data Scientist

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Also published here
Learnrepo
Coffee-web