How to Use ASR System for Accurate Transcription Properties of Your Digital Product

TLDR

Facebook’s wav2vec 2.0 allows you to pre-train transcription systems using audio only — with no corresponding transcription — and then use just a tiny transcribed dataset for training. The LibriSpeech dataset is the most commonly used audio processing dataset in speech research. In this blog, we share how we worked with wAV2vec with great results. We show the transcription for one audio sample in the dev-clean dataset. In this example, the ASR has inserted an “a”, identified “John” as “Jones” and deleted the word “are” from the ground truth.via the TL;DR App

no story

Written by zilunpeng | fin tech company

Published by HackerNoon on 2021/04/07