Towards an ImageNet Moment for Speech-to-Text: A Deep Dive

TLDR

The ImageNet moment in a given ML sub-field arrives when the architectures and model building blocks required to solve 95% of standard “useful” tasks are widely available as standard and tested open-source framework modules are available. The models are available with pre-trained weights; the compute required to train models for everyday tasks is minimal (e.g. 1–10 GPU days in STT) compared to the compute requirements previously reported in papers. We have chosen the following stack of technologies: acoustic modellingFeed neural networks for acoustic modelling (mostly grouped 1D convolutions with squeeze and excitation blocks)via the TL;DR App

no story

Written by snakers41 | Data Scientist

Published by HackerNoon on 2020/08/23