paint-brush
Hyperparameters and Baseline Experiments in Dialog Systemsby@feedbackloop

Hyperparameters and Baseline Experiments in Dialog Systems

by The FeedbackLoop: #1 in PM Education
The FeedbackLoop: #1 in PM Education HackerNoon profile picture

The FeedbackLoop: #1 in PM Education

@feedbackloop

The FeedbackLoop offers premium product management education, research papers, and...

January 17th, 2024
Read on Terminal Reader
Read this story in a terminal
Print this story
Read this story w/o Javascript
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Baseline experiments in dialog systems unfold with key hyperparameter settings. Models trained for five epochs, extended to ten for erroneous dialogs, featured a batch size of 32, learning rate of 5e − 5, and AdamW optimizer. LLAMA adopted unique finetuning parameters. Results, reflected in Table 17, elucidate the interplay of data quality, system errors, and model performance through F1-Score and BLEU metrics.
featured image - Hyperparameters and Baseline Experiments in Dialog Systems
1x
Read by Dr. One voice-avatar

Listen to this story

The FeedbackLoop: #1 in PM Education HackerNoon profile picture
The FeedbackLoop: #1 in PM Education

The FeedbackLoop: #1 in PM Education

@feedbackloop

The FeedbackLoop offers premium product management education, research papers, and certifications. Start building today!

Learn More
LEARN MORE ABOUT @FEEDBACKLOOP'S
EXPERTISE AND PLACE ON THE INTERNET.

Authors:

(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;

(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;

(3) Ye Tian, Wluper, London, United Kingdom;

(4) Nikolai Rozanov, Wluper, London, United Kingdom;

(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.

Table of Links

Abstract & Introduction

Related Work

Datasets Examined

Manual Error Type Analysis and Taxonomies

Automatic Filtering for Potentially Relevant Dialogs

Statistical Analysis

Evaluation and Experiments

Discussion

Conclusion, Limitation, Acknowledgments, and References

A Integrated Error Taxonomy – Details

B Error-Indicating Sentences And Phrases

C Automatic Filtering – Implementation

D Automatic Filtering – Sentence-Level Analysis

E Task-Oriented Dialogs – Examples

F Effectiveness Of Automatic Filtering – A Detailed Analysis

G Inter-Annotator Agreement – Detailed Analysis

H Annotation Guidelines

I Hyperparameters and Baseline Experiments

J Human-Human Dialogs – Examples

I Hyperparameters and Baseline Experiments

Hyperparameters All baseline models were trained for five epochs. For the experiment using erroneous dialogs, we trained the models for ten epochs. We used a batch size of 32 and a learning rate of 5e − 5 with no warmup steps. As optimizer, we used the implementation of AdamW 11 (Loshchilov and Hutter, 2019) in Pytorch. Except for LLAMA (Touvron et al., 2023), we fully-finetuned all models. For LLAMA, we only finetuned the LoRA (Hu et al., 2022) weights, using a rank of 8, an alpha of 16, and a dropout rate of 0.05.


Results Table 17 shows the results of our baseline experiments using word-overlapping F1-Score and BLEU (Papineni et al., 2002).


Compared to the results of our experiment using erroneous dialogs (Table 11), both F1-Score and BLEU are higher in case of GPT-2 and LLAMA. We attribute this to the comparatively bad quality of the erroneous dialogs. The majority of these


Table 17: Results of our baseline experiments.

Table 17: Results of our baseline experiments.


dialogs comes from the SFC (Hancock et al., 2019) dataset, where most errors are due to out-of-context system utterances (see Section 6.1) and were ignored by users (see Section 6.2).


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.


L O A D I N G
. . . comments & more!

About Author

The FeedbackLoop: #1 in PM Education HackerNoon profile picture
The FeedbackLoop: #1 in PM Education@feedbackloop
The FeedbackLoop offers premium product management education, research papers, and certifications. Start building today!

TOPICS

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave
Read on Terminal Reader
Read this story in a terminal
 Terminal
Read this story w/o Javascript
Read this story w/o Javascript
 Lite
Thetechstreetnow
Serendeputy
Lizedin
X REMOVE AD