Authors:
(1) Dominic Petrak, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany;
(2) Nafise Sadat Moosavi, Department of Computer Science, The University of Sheffield, United Kingdom;
(3) Ye Tian, Wluper, London, United Kingdom;
(4) Nikolai Rozanov, Wluper, London, United Kingdom;
(5) Iryna Gurevych, UKP Lab, Department of Computer Science, Technical University of Darmstadt, Germany.
Manual Error Type Analysis and Taxonomies
Automatic Filtering for Potentially Relevant Dialogs
Conclusion, Limitation, Acknowledgments, and References
A Integrated Error Taxonomy – Details
B Error-Indicating Sentences And Phrases
C Automatic Filtering – Implementation
D Automatic Filtering – Sentence-Level Analysis
E Task-Oriented Dialogs – Examples
F Effectiveness Of Automatic Filtering – A Detailed Analysis
G Inter-Annotator Agreement – Detailed Analysis
I Hyperparameters and Baseline Experiments
J Human-Human Dialogs – Examples
In Section 6.1, we say that the dialogs of task-oriented datasets considered in our work are of little variety in language and flow. This is most obvious in case of BABI (Bordes et al., 2017) which we want to show using the following example:
1. USER: good morning
SYSTEM: hello what can i help you with today
USER: may i have a table with british cuisine in a cheap price range
SYSTEM: i’m on it. where should it be
USER: for eight please
4. SYSTEM: ok let me look into some options for you
USER: actually i would prefer in paris
SYSTEM: sure is there anything else to update
8. USER: no
8. SYSTEM: ok let me look into some options for you
This dialog is prototypical for BABI. The first few turns (the first four in this case) contain the original user request. Then, the user asks for changes (turn five to eight). Overall, the dialogs are very scripted. Users are mostly clear and distinctive in their responses. They contain only little variety in language and flow, which makes them seem very unrealistic. We do not observe a change in this regard across tasks.
The following dialog is from SGD (Rastogi et al., 2020):
1. USER: Our family is planning for a vacation trip. Please can you find a good rental car for a pleasant trip?
1. SYSTEM: At what time do you need the car?
2. USER: I can pick up the car at 15:00
2. SYSTEM: From when onwards you are looking for the rental car?
3. USER: I need the car from 11th of March.
3. SYSTEM: So, how long you need the car?
4. USER: I might need it till the 13th of this month.
4. SYSTEM: Okay! So, where would be your pick up location?
5. USER: I can pick up the car in Paris on 9th of this month.
5. SYSTEM: Okay! There is a nice car, Chevrolet Cruze, a Standard car is available for pick up on March 9th at Charles de Gaulle International Airport. Do you wish to rent this car?
6. USER: Do you have any other cars for I need the car till 10th of this month.
6. SYSTEM: Yup! We’ve 3 more cars. Do you wish to rent Honda Accord, which is a Standard car and its ready for pickup on March 9th from Charles de Gaulle International Airport?
7. USER: Okay. I go for the Honda Accord.
The dialog looks much more natural, which might be due to its type. While BABI consists of human-bot dialogs, SGD consists of human-human dialogs. However, the flow is very similar. Up until turn five, the user’s request is refined. Then, the system comes up with a suggestion which the user denies (for a couple of turns) by asking for something different. At some point, user’s agree or disagree on the system’s suggestions. Then the dialog ends.
The following dialog is from MWoZ (Budzianowski et al., 2018):
1. USER: Hi, I could use help with my planning. I am trying to find a place to stay.
1. SYSTEM: Ok, to help us find the right place for you lets talk about what you would like. Do you have a price range you prefer? Or an area of town?
2. USER: I would like it to be in the south and have a rating of 3 stars.
2. SYSTEM: We have a moderately priced guesthouse and an expensive hotel that match your requirements.
3. USER: Does either offer free parking?
3. SYSTEM: They both offer free parking and internet.
4. USER: Okay can you please book it for 7 people for 4 nights starting on Tuesday?
4. SYSTEM: Which one? The guesthouse or the hotel?
5. USER: The hotel, please.
5. SYSTEM: Reference number is : BS0ODSGJ. Is there anything else?
6. USER: Yes, I also need a train from stevenage to cambridge. I want to leave after 10:15 on tuesday.
6. SYSTEM: I have train TR0517 that leaves at 11:54. Would that be something that would work for you?
7. USER: That’s perfect. Could I get the train ID and travel time?
7. SYSTEM: The travel time is 49 minutes.
8. USER: Great, that’s all I need, thanks so much for your help! Have a great day!
8. SYSTEM: You’re very welcome! Thank you! Bye!
Like in case of SGD, the dialog looks very natural. However, in contrast to BABI and SGD, the dialogs consist less of refinements, but more of multiple tasks as shown in this example. First, the user asks for a hotel in a city, then he asks for transportation to that city. Sometimes, he also asks for locations with entertainment. While the first request is usually completed (the booking of a room in this case), the second request is usually about gathering information (the user just asks for the train number, departure and travel time, but not for booking a seat).
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.