Dialog Datasets Annotation Guidelines

by The FeedbackLoop: #1 in PM EducationJanuary 17th, 2024

Too Long; Didn't Read

Annotation tasks in dialog datasets involve discerning user dissatisfaction, new concepts, corrections, or alternative responses. Annotators use provided taxonomies for error types and user responses, engaging with both conspicuous and cold dialogs. Dialog formats vary, presenting either conspicuous phrases indicating errors or cold dialogs requiring identification of error situations. The taxonomy includes error types like "Ignore Question" and user responses like "The user ignores the error and continues the conversation." This guide empowers annotators to adeptly navigate the diverse world of dialog annotations.

featured image - Dialog Datasets Annotation Guidelines

Authors:

(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;

(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;

(3) Ye Tian, Wluper, London, United Kingdom;

(4) Nikolai Rozanov, Wluper, London, United Kingdom;

(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.

Table of Links

Abstract & Introduction

Related Work

Datasets Examined

Manual Error Type Analysis and Taxonomies

Automatic Filtering for Potentially Relevant Dialogs

Statistical Analysis

Evaluation and Experiments

Discussion

Conclusion, Limitation, Acknowledgments, and References

A Integrated Error Taxonomy – Details

B Error-Indicating Sentences And Phrases

C Automatic Filtering – Implementation

D Automatic Filtering – Sentence-Level Analysis

E Task-Oriented Dialogs – Examples

F Effectiveness Of Automatic Filtering – A Detailed Analysis

G Inter-Annotator Agreement – Detailed Analysis

H Annotation Guidelines

I Hyperparameters and Baseline Experiments

J Human-Human Dialogs – Examples

H Annotation Guidelines

H.1 Annotation Task

Learning from textual user feedback, i.e., free-form text feedback that expresses user dissatisfaction, describes new knowledge (new concepts), provides corrections or alternative responses, is increasingly important to continuously improve dialog systems. As of today, only few datasets provide annotation for such data. However, there are various dialog datasets publicly available – from a variety of domains and for all kinds of tasks. As an annotator, you are provided with two different taxonomies, an error type taxonomy, and a user response type taxonomy, and two different types of dialogs, conspicuous dialogs and cold dialogs. Your task is to annotate the dialogs using the given taxonomies. The conspicuous dialogs are identified to contain conspicuous sentences in user utterance that address an error in the previous system response. You have to decide (1) whether or not this conspicuous sentence correctly addresses an error in the previous system response, (2) which error type is represented in the previous system response, and (3) how the user reacted. The cold dialogs do not contain such information. In this case, your task is to identify error situations and user response types in the dialogs just given the taxonomies.

H.2 Dialog Format

To avoid bias, we provide the dialogs in a unified format without the original task-specific annotations. Here’s an example of a conspicuous dialog:

{

"unique_id": "PMUL0121.

json_multiwoz_train",

"id": "PMUL0121.json",

"turns": [

{

"utterance": "Hi there! Can you

give me some info on

Cityroomz?",

"response": "Cityroomz is

located at Sleeperz Hotel,

Station Road in the Centre

area. It is a moderatley

priced hotel with a 0 star

rating. Would you like to

book a room?"

},

{

"utterance": "Yes please. I

need it for 7 people for 3

nights starting on tuesday.

",

"response": "How many days

would you like to book it

for?"

},

{

"utterance": "I would like to

book it for 3 nights

beginning on Tuesday.",

"response": "Ok, Your rooms

have been booked. Your

reference number is

CYJDAR7R . Is there

anything else I can help

you with today?"

},

{

"utterance": "Yeah, I need a

restaurant in the west and

with expensive pricing.",

"response": "There are 9

expensive restaurants in

the west area. Do you

prefer a certain type of

food?"

},

{

"utterance": "No, I don’t have

a preference, but I need

the phone number and

postcode.",

"response": "I recommend

Graffiti, the number is

01223277977 and the

postcode is cb30lx. Would

you like me to book a table

for you?"

},

{

"utterance": "No, right now, I

just need the infor. Thanks

!",

"response": "Can I help you

with anything else today?"

},

{

"utterance": "That is all.

thanks",

"response": "Thank you for

using our system."

}

],

"conspicuous": [

{

"phrase": [

"actually, i apologize no

need to book, i was just

gathering information.",

" i would like to book

it for 3 nights

beginning on tuesday."

],

"confidence": 0.593,

"turn": 2

}

],

"annotations": [

{

"turn": 2,

"annotation": {

"error_type": "E2",

"comment": "the system

misses intent/slots. the

user already said that

he need it for three

nights",

"error": "C1",

"user_response": "B3"

}

]

}

Each dialog consists of a unique id, an id, and its turns. Conspicuous is an array. The first value is an error-indicating phrase, a phrase that was identified to express user dissatisfaction in the utterance of the corresponding turn. The second value is the value from an utterance of this dialog that was identified to be similar to this error-indicating sentence. Confidence represents the similarity. Dialogs with multiple conspicuous values are possible. The annotations list has an entry for each conspicuous phrase. Please add your annotations here. In comment, you can share your thoughts with us.

Here’s an example for an cold dialog:

[

{

"dialog": "p2 cats are like

cartoons. p1 that’s cool ,

whats your favorite food ? p2

pizza. p1 ni hao . as my

father says . you must have

great plans ahead ? p2 yes, i

plan to be a success.",

"error": "C2",

"error_type": "",

"user_response": "",

"comment": "",

"turn": "",

"phrase": "",

},

...

]

The structure is a bit different. All cold dialogs are provided in one large json file, and the dialogs

themselves maintain the structure of the original dataset. In this case, it is an dialog from the humanbot split of the Self-Feeding Chatbot (p2 represents the system, p1 represents the user). There are two additional fields here: turn and phrase. If you by chance find a phrase that indicates dissatisfaction in the user’s response to a system’s error, please add phrase and turn to these fields.

H.3 Taxonomies

H.3.1 Error Type Taxonomy

This is the taxonomy for the field error type.

[

{

"id": "E1",

"name": "Ignore Question",

"description": "The response

ignores a user\’s

question.",

"example": "User: ’How is the

weather in Berlin?’, System:’

Have you ever heard of Bobby

Jean?’"

},

{

"id": "E2",

"name": "Ignore Request",

"description": "The response

ignores a user’s

request to do something.

",

"example": "User: ’Can you book me

a room for two please?’,

System: ’Look at these flowers

... Aren’t they beautiful?’"

},

{

"id": "E3",

"name": "Ignore Expectation",

"description": "The response

does not fulfill the

user’s expectation.",

"example": "User: ’Can you book me

a room at The Inn for two

nights starting tomorrow? Can

you also provide me with the

address?’, System:’Absolutely!’

"

},

{

"id": "E4",

"name": "Attribute Error",

"description": "The response

suggests that the

system did not get the

attributes/slots right.",

"example": "User: ’Can you book me

an expensive restaurant in

the west of town?’, System:’

Sure. I would recommend The

Breakfast Club. It is a very

cheap and nice location."

},

{

"id": "E5",

"name": "Factually Incorrect

",

"description": "The response

contains information

that is factually

incorrect.",

"example": "User: ’Who is the

current president of the

United States of America?’,

System:’Abraham Lincoln.’"

},

{

"id": "E6",

"name": "Topic Transition

Error",

"description": "The response

transitions to another /

a previous topic

without reasonable

explanation.",

"example": "User: ’I think I have

a cold.’, System:’I really

love pears. So juicy!’"

},

{

"id": "E7",

"name": "Conversationality",

"description": "The response

indicates that the

system lost track, e.g.,

it repeats previous

responses (without

asking for missing

information) or

contradicts itself.",

"example": "System: ’Man is shaped

by his environment.’, User:’

Yes. But does that justify

this tone?’, System: ’Man is

shaped by his environment.’"

},

{

"id": "E8",

"name": "Unclear Intention",

"description": "The response

suggests that the user’

s intent was not

successfully conveyed.",

"example": "User: ’Can you book me

a flight to London?’, System:

’Yes. What do you want for

dinner?’"

},

{

"id": "E9",

"name": "Lack of Sociality",

"description": "The response

lacks consideration of

social standards, e.g.,

greetings, is toxic or

disrespectful.",

"examples": "User: ’Hi! My name is

Jochen Schweitzer!’, System: ’

Sure. How can I help you?’"

},

{

"id": "E10",

"name": "Lack of Common

Sense",

"description": "The

information in the

responses opposes the

opinion of the majority.

",

"example": "User: ’You have to

work hard for success.’,

System: ’Hmm... No. What is to

be, will be.’"

}

]

H.3.2 User Response Taxonomy

This is the taxonomy for the field user response.

[

{

"id": "UR1",

"short": "The user ignores

the error and continues

the conversation.",

"description": "The user

simply continues and

does not draw the system’

s attention to the error.

",

"example": "-"

},

{

"id": "UR2",

"short": "The user repeats

or rephrases his/her

concern.",

"description": "The user

repeats or rephrases his

originally concern.",

"example": "’Can you book a

restaurant for two for

tonight?’ vs. ’Can you

book a table for two for

tonight?’"

},

{

"id": "UR3",

"short": "The user makes the

system aware of the

error and provides a

correction.",

"description": "The user

makes the system aware

of the error and

provides information to

address what is missing

or wrong in its

utterance. ",

"example": "’No, I didn’t

want you to book a table.

I just wanted the

address!’"

},

{

"id": "UR4",

"short": "The user makes the

system aware without

providing a correction.",

"description": "The user

makes the system aware

without providing

additional information",

"example": "’No. You’re

wrong.’"

},

{

"id": "UR5",

"short": "The user asks for

clarification.",

"description": "The user is

puzzled and asks for

clarification, e.g. the

system suddenly switches

to another topic or

mixed concepts up.",

"example": "’What do you

mean?’"

}

]

This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Lead image by Clayton Robbins on Unsplash

L O A D I N G
. . . comments & more!

About Author

The FeedbackLoop: #1 in PM Education@feedbackloop

The FeedbackLoop offers premium product management education, research papers, and certifications. Start building today!

Read my stories Learn More

TOPICS

tech-stories #dataset-annotation #dialog-systems #dialog-datasets #ai-research #conversational-ai #ai-training-data #datasets-annotation-guidelines #free-text-human-feedback

THIS ARTICLE WAS FEATURED IN...

Permanent on Arweave

Terminal

Lite

Join HackerNoon

Latest technology trends. Customized Experience. Curated Stories. Publish Your Ideas

Dialog Datasets Annotation Guidelines

Too Long; Didn't Read

Table of Links

H Annotation Guidelines

H.1 Annotation Task

H.2 Dialog Format

H.3 Taxonomies

H.3.1 Error Type Taxonomy

H.3.2 User Response Taxonomy

About Author

TOPICS

THIS ARTICLE WAS FEATURED IN...

RELATED STORIES