Authors:
(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;
(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;
(3) Ye Tian, Wluper, London, United Kingdom;
(4) Nikolai Rozanov, Wluper, London, United Kingdom;
(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.
Manual Error Type Analysis and Taxonomies
Automatic Filtering for Potentially Relevant Dialogs
Conclusion, Limitation, Acknowledgments, and References
A Integrated Error Taxonomy – Details
B Error-Indicating Sentences And Phrases
C Automatic Filtering – Implementation
D Automatic Filtering – Sentence-Level Analysis
E Task-Oriented Dialogs – Examples
F Effectiveness Of Automatic Filtering – A Detailed Analysis
G Inter-Annotator Agreement – Detailed Analysis
I Hyperparameters and Baseline Experiments
J Human-Human Dialogs – Examples
In this section, we describe the Integrated Error Taxonomy as proposed by Higashinaka et al. (2021). In principle, they differentiate between form violation and content violation. The form violation usually represents errors that oppose some kind of meta criteria, e.g., the form of language or the ignorance of social norms. In contrast, content violations refer to, e.g., inconsistent or redundant utterances, or other things that might cause a dialog breakdown. Content violation is hereinafter abbreviated as CV (form violation as FV). Furthermore, they generally refer to utterances, while we refer to system utterance, as this is evident from their examples and simplifies understanding (from our perspective).
Utterance-level errors typically expose languagegeneration deficiencies of the system.
• Uninterpretable (I1), FV – The system’s utterance does not contain recognizable words or is just a text fragment, e.g., the utterance just consists of the words of yours.
• Grammatical Error (I2), FV – The system’s utterance is grammatical incorrect, e.g., he will attend yesterday.
• Semantic Error (I3), CV – The combination of predicates and arguments does not result in any meaning, e.g., I feel painting.
• Wrong Information (I4), CV – The given information is clearly wrong, e.g., Meat Loaf was the songwriter of Jim Steinman (it was vice versa).
Response-level errors describe errors in system utterances, e.g., in responses to user questions or requests.
• Ignore Question (I5), FV – The system utterance ignores the user’s question, e.g., U: What will the weather be like tomorrow? S: Look at these beautiful flowers...
• Ignore Request (I6), FV – The system ignores the user’s request to do something, e.g., U: Please book me a room in the west part of town. S: I like sports.
• Ignore Proposal (I7), FV – The system ignores the user’s proposal to do something, e.g., U: Let’s talk about movies. S: I like pizza.
• Ignore Greeting (I8), FV – The system ignores the user’s greeting, e.g., U: Hello, my friend! S: What do you want from me?
• Ignore Expectation (I9), CV – The system utterance does not have the expected content but the user’s intention has been successfully conveyed, e.g., U: Do you have a favorite football team? S: Yes. (the system should tell the user the name of his favorite football team).
Context-level errors refer not only to the local context, i.e., adjacent pairs of user utterance and system utterance, but to a broader (sometimes global) context.
• Unclear Intention (I10), FV – The system utterance is on the current topic, but the intention is unclear, e.g., U: What is your favorite movie? S: I like Daddy’s Home 2. U: Oh, yes! Mel Gibson is so good in this role... S: The Evil Dead is so bloody...
• Topic Transition Error (I11), FV – The system suddenly switches to another topic without explanation, e.g., U: I really enjoyed Feynman’s physics lecture. So enlightening! S: Well, what is your favorite food?
• Lack of Information (I12), FV – The system utterance misses important information, such as subject, object, or modifier, e.g., U: It’s a beautiful day, isn’t it? S: I hate. (object is missing).
• Self-Contradiction (I13), CV – The system utterance contradicts what it said before, e.g., S: I’m from Boston, Massachusetts. U: Oh, wow! I’m from Bartlesville, Oklahoma! S: Really? I’m from Seattle, Washington!
• Contradiction (I14), CV – The system utterance contradicts what has been said by the user, e.g., U: I like strawberries. S: Oh, why don’t you like strawberries?
• Repetition (I15), CV – The system repeats something it has already said before (this includes repeated questioning), e.g., S: I look forward to the concert. U: Oh, which concert? S: I look forward to the concert.
Society-level errors refer to errors that indicate that the system lacks etiquette or common sense.
• Lack of Sociality (I16), FV – The system utterance lacks consideration towards the user and contains hate speech or is in general disrespectful, e.g., U: Yesterday I went to the federal employment agency. S: Okay. Are you incapable of finding a job on your own?
• Lack of Common Sense (I17), CV – The system utterance contains content that differs from the opinion of the majority, e.g., S: I like to eat chicken raw. U: Oh, dear... You have to cook it first!
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.