Authors:
(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;
(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;
(3) Ye Tian, Wluper, London, United Kingdom;
(4) Nikolai Rozanov, Wluper, London, United Kingdom;
(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.
Manual Error Type Analysis and Taxonomies
Automatic Filtering for Potentially Relevant Dialogs
Conclusion, Limitation, Acknowledgments, and References
A Integrated Error Taxonomy – Details
B Error-Indicating Sentences And Phrases
C Automatic Filtering – Implementation
D Automatic Filtering – Sentence-Level Analysis
E Task-Oriented Dialogs – Examples
F Effectiveness Of Automatic Filtering – A Detailed Analysis
G Inter-Annotator Agreement – Detailed Analysis
I Hyperparameters and Baseline Experiments
J Human-Human Dialogs – Examples
Learning from textual user feedback, i.e., free-form text feedback that expresses user dissatisfaction, describes new knowledge (new concepts), provides corrections or alternative responses, is increasingly important to continuously improve dialog systems. As of today, only few datasets provide annotation for such data. However, there are various dialog datasets publicly available – from a variety of domains and for all kinds of tasks. As an annotator, you are provided with two different taxonomies, an error type taxonomy, and a user response type taxonomy, and two different types of dialogs, conspicuous dialogs and cold dialogs. Your task is to annotate the dialogs using the given taxonomies. The conspicuous dialogs are identified to contain conspicuous sentences in user utterance that address an error in the previous system response. You have to decide (1) whether or not this conspicuous sentence correctly addresses an error in the previous system response, (2) which error type is represented in the previous system response, and (3) how the user reacted. The cold dialogs do not contain such information. In this case, your task is to identify error situations and user response types in the dialogs just given the taxonomies.
To avoid bias, we provide the dialogs in a unified format without the original task-specific annotations. Here’s an example of a conspicuous dialog:
{
"unique_id": "PMUL0121.
json_multiwoz_train",
"id": "PMUL0121.json",
"turns": [
{
"utterance": "Hi there! Can you
give me some info on
Cityroomz?",
"response": "Cityroomz is
located at Sleeperz Hotel,
Station Road in the Centre
area. It is a moderatley
priced hotel with a 0 star
rating. Would you like to
book a room?"
},
{
"utterance": "Yes please. I
need it for 7 people for 3
nights starting on tuesday.
",
"response": "How many days
would you like to book it
for?"
},
{
"utterance": "I would like to
book it for 3 nights
beginning on Tuesday.",
"response": "Ok, Your rooms
have been booked. Your
reference number is
CYJDAR7R . Is there
anything else I can help
you with today?"
},
{
"utterance": "Yeah, I need a
restaurant in the west and
with expensive pricing.",
"response": "There are 9
expensive restaurants in
the west area. Do you
prefer a certain type of
food?"
},
{
"utterance": "No, I don’t have
a preference, but I need
the phone number and
postcode.",
"response": "I recommend
Graffiti, the number is
01223277977 and the
postcode is cb30lx. Would
you like me to book a table
for you?"
},
{
"utterance": "No, right now, I
just need the infor. Thanks
!",
"response": "Can I help you
with anything else today?"
},
{
"utterance": "That is all.
thanks",
"response": "Thank you for
using our system."
}
],
"conspicuous": [
{
"phrase": [
"actually, i apologize no
need to book, i was just
gathering information.",
" i would like to book
it for 3 nights
beginning on tuesday."
],
"confidence": 0.593,
"turn": 2
}
],
"annotations": [
{
"turn": 2,
"annotation": {
"error_type": "E2",
"comment": "the system
misses intent/slots. the
user already said that
he need it for three
nights",
"error": "C1",
"user_response": "B3"
}
}
]
}
Each dialog consists of a unique id, an id, and its turns. Conspicuous is an array. The first value is an error-indicating phrase, a phrase that was identified to express user dissatisfaction in the utterance of the corresponding turn. The second value is the value from an utterance of this dialog that was identified to be similar to this error-indicating sentence. Confidence represents the similarity. Dialogs with multiple conspicuous values are possible. The annotations list has an entry for each conspicuous phrase. Please add your annotations here. In comment, you can share your thoughts with us.
Here’s an example for an cold dialog:
[
{
"dialog": "p2 cats are like
cartoons. p1 that’s cool ,
whats your favorite food ? p2
pizza. p1 ni hao . as my
father says . you must have
great plans ahead ? p2 yes, i
plan to be a success.",
"error": "C2",
"error_type": "",
"user_response": "",
"comment": "",
"turn": "",
"phrase": "",
},
...
]
The structure is a bit different. All cold dialogs are provided in one large json file, and the dialogs
themselves maintain the structure of the original dataset. In this case, it is an dialog from the humanbot split of the Self-Feeding Chatbot (p2 represents the system, p1 represents the user). There are two additional fields here: turn and phrase. If you by chance find a phrase that indicates dissatisfaction in the user’s response to a system’s error, please add phrase and turn to these fields.
This is the taxonomy for the field error type.
[
{
"id": "E1",
"name": "Ignore Question",
"description": "The response
ignores a user\’s
question.",
"example": "User: ’How is the
weather in Berlin?’, System:’
Have you ever heard of Bobby
Jean?’"
},
{
"id": "E2",
"name": "Ignore Request",
"description": "The response
ignores a user’s
request to do something.
",
"example": "User: ’Can you book me
a room for two please?’,
System: ’Look at these flowers
... Aren’t they beautiful?’"
},
{
"id": "E3",
"name": "Ignore Expectation",
"description": "The response
does not fulfill the
user’s expectation.",
"example": "User: ’Can you book me
a room at The Inn for two
nights starting tomorrow? Can
you also provide me with the
address?’, System:’Absolutely!’
"
},
{
"id": "E4",
"name": "Attribute Error",
"description": "The response
suggests that the
system did not get the
attributes/slots right.",
"example": "User: ’Can you book me
an expensive restaurant in
the west of town?’, System:’
Sure. I would recommend The
Breakfast Club. It is a very
cheap and nice location."
},
{
"id": "E5",
"name": "Factually Incorrect
",
"description": "The response
contains information
that is factually
incorrect.",
"example": "User: ’Who is the
current president of the
United States of America?’,
System:’Abraham Lincoln.’"
},
{
"id": "E6",
"name": "Topic Transition
Error",
"description": "The response
transitions to another /
a previous topic
without reasonable
explanation.",
"example": "User: ’I think I have
a cold.’, System:’I really
love pears. So juicy!’"
},
{
"id": "E7",
"name": "Conversationality",
"description": "The response
indicates that the
system lost track, e.g.,
it repeats previous
responses (without
asking for missing
information) or
contradicts itself.",
"example": "System: ’Man is shaped
by his environment.’, User:’
Yes. But does that justify
this tone?’, System: ’Man is
shaped by his environment.’"
},
{
"id": "E8",
"name": "Unclear Intention",
"description": "The response
suggests that the user’
s intent was not
successfully conveyed.",
"example": "User: ’Can you book me
a flight to London?’, System:
’Yes. What do you want for
dinner?’"
},
{
"id": "E9",
"name": "Lack of Sociality",
"description": "The response
lacks consideration of
social standards, e.g.,
greetings, is toxic or
disrespectful.",
"examples": "User: ’Hi! My name is
Jochen Schweitzer!’, System: ’
Sure. How can I help you?’"
},
{
"id": "E10",
"name": "Lack of Common
Sense",
"description": "The
information in the
responses opposes the
opinion of the majority.
",
"example": "User: ’You have to
work hard for success.’,
System: ’Hmm... No. What is to
be, will be.’"
}
]
This is the taxonomy for the field user response.
[
{
"id": "UR1",
"short": "The user ignores
the error and continues
the conversation.",
"description": "The user
simply continues and
does not draw the system’
s attention to the error.
",
"example": "-"
},
{
"id": "UR2",
"short": "The user repeats
or rephrases his/her
concern.",
"description": "The user
repeats or rephrases his
originally concern.",
"example": "’Can you book a
restaurant for two for
tonight?’ vs. ’Can you
book a table for two for
tonight?’"
},
{
"id": "UR3",
"short": "The user makes the
system aware of the
error and provides a
correction.",
"description": "The user
makes the system aware
of the error and
provides information to
address what is missing
or wrong in its
utterance. ",
"example": "’No, I didn’t
want you to book a table.
I just wanted the
address!’"
},
{
"id": "UR4",
"short": "The user makes the
system aware without
providing a correction.",
"description": "The user
makes the system aware
without providing
additional information",
"example": "’No. You’re
wrong.’"
},
{
"id": "UR5",
"short": "The user asks for
clarification.",
"description": "The user is
puzzled and asks for
clarification, e.g. the
system suddenly switches
to another topic or
mixed concepts up.",
"example": "’What do you
mean?’"
}
]
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
Lead image by Clayton Robbins on Unsplash