Towards an ImageNet Moment for Speech-to-Text: A Deep Dive

Written by snakers41 | Published 2020/08/23
Tech Story Tags: deep-learning | data-science | speech-recognition | machine-learning | datasets | latest-tech-stories | hackernoon-top-story | speech-to-text-recognition

TLDR The ImageNet moment in a given ML sub-field arrives when the architectures and model building blocks required to solve 95% of standard “useful” tasks are widely available as standard and tested open-source framework modules are available. The models are available with pre-trained weights; the compute required to train models for everyday tasks is minimal (e.g. 1–10 GPU days in STT) compared to the compute requirements previously reported in papers. We have chosen the following stack of technologies: acoustic modellingFeed neural networks for acoustic modelling (mostly grouped 1D convolutions with squeeze and excitation blocks)via the TL;DR App

no story

Written by snakers41 | Data Scientist
Published by HackerNoon on 2020/08/23