paint-brush
AccentFold: Enhancing Accent Recognition - Related Workby@phonology

AccentFold: Enhancing Accent Recognition - Related Work

by Phonology Technology
Phonology Technology HackerNoon profile picture

Phonology Technology

@phonology

Unlocking language's rhythm, harmonizing sound, and meaning - via the...

August 28th, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Using existing state-of-art pre-trained models to probe for linguistic information and using that to improve models’ performance has gained interest in the community recently. We take a different approach and use the extracted accent embeddings from a pre- trained model to decide what subset of data to use to build an ASR. We do this at a much larger scale of 41 African English accents.
featured image - AccentFold: Enhancing Accent Recognition - Related Work
1x
Read by Dr. One voice-avatar

Listen to this story

Phonology Technology HackerNoon profile picture
Phonology Technology

Phonology Technology

@phonology

Unlocking language's rhythm, harmonizing sound, and meaning - via the newest technologies and technological research.

About @phonology
LEARN MORE ABOUT @PHONOLOGY'S
EXPERTISE AND PLACE ON THE INTERNET.
0-item

STORY’S CREDIBILITY

Academic Research Paper

Academic Research Paper

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

Authors:

(1) Abraham Owodunni, Intron Health, Masakhane, and this author contributed equally;

(2) Aditya Yadavalli, Karya, Masakhane, and this author contributed equally;

(3) Chris Emezuem, Mila Quebec AI Institute, Lanfrica, Masakhane, and this author contributed equally;

(4) Tobi Olatunji, Intron Health and Masakhane, and this author contributed equally;

(5) Clinton Mbataku, AI Saturdays Lagos.

Abstract and 1 Introduction

2 Related Work

3 AccentFold

4 What information does AccentFold capture?

5 Empirical study of AccentFold

6 Conclusion, Limitations, and References

Using existing state-of-art pre-trained models to probe for linguistic information and using that to improve models’ performance has gained interest in the community recently. Prasad and Jyothi (2020) use various probing techniques on the DeepSpeech 2 model (Amodei et al., 2015). They find that first few layers encode most of the accent related information. Bartelds and Wieling (2022) quantify language variation in Dutch using a combination of XLS-53 (Conneau et al., 2020) embeddings and Dynamic Time Warping (Sakoe and Chiba, 1978). They show that this leads to a Dutch dialect identification system that is better than a system dependent on the phonetic transcriptions with just six seconds of speech. Thus, proving that pre-trained models such as the one proposed by Conneau et al. (2020) indeed capture rich linguistic information in their representations. Jain et al. (2018); Li et al. (2021a) extract accent embeddings learnt from a separate network and input those embeddings along with other features. They show that this leads to a superior accented ASR model. Our work is most closely related to (Kothawade et al.,2023), where the authors explore various statistical methods such as Submodular Mutual Information in combination with hand-crafted features to select a subset of data to improve accented ASR. Our work differs from previous works in two important ways (1) we take a different approach and use the extracted accent embeddings from a pre-trained model to decide what subset of data to use to build an ASR that performs the best on a target accent in a cost-effective manner (2) we do this at a much larger scale of 41 African English accents. Note that the previous highest was 21 English accents by Li et al. (2021a).


This paper is available on arxiv under CC BY-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

Phonology Technology HackerNoon profile picture
Phonology Technology@phonology
Unlocking language's rhythm, harmonizing sound, and meaning - via the newest technologies and technological research.

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
X
Phonology
X REMOVE AD