paint-brush
Qualitative Evaluation: Understanding User Needs through Dynamic Dialogueby@textmodels

Qualitative Evaluation: Understanding User Needs through Dynamic Dialogue

tldt arrow

Too Long; Didn't Read

Discover how a dynamic dialogue system is experimented with for condominium property recommendations, including qualitative evaluation, user simulator assessments, and interactive control by the controller to improve dialogue flow and user satisfaction.
featured image - Qualitative Evaluation: Understanding User Needs through Dynamic Dialogue
Writings, Papers and Blogs on Text Models HackerNoon profile picture

Authors:

(1) Yugen Sato, Meiji University;

(2) Taisei Nakajima, Meiji University;

(3) Tatsuki Kawamoto, Meiji University;

(4) Tomohiro Takagi, Meiji University.

Table of Links

Abstract & Introduction

Related works

Method

Experiment

Conclusion & References

4 Experiment

In this section, we experiment with the system described in 3 on the subject of recommending condominium properties. The main goal of the assistant is to repeatedly ask various questions to a user looking for an apartment property and to extract the user’s needs, and the controller is to support the user in this process. In generating the user simulator, we do not use a specific dialogue data set, but only prior knowledge of the user simulator to conduct the dialogue with the assistant. The model used in the experiments was GPT-4 for the assistant, controller, and user simulator, all of which were generated using OpenAI’s web-based module, [4]. The prompts for each module are the same as those introduced in 3.1 and 3.2, with the generated content of each module inserted in the area enclosed in {}. An example of the interaction between the user simulator and the assistant is shown in Figure 1.

4.1 Qualitative evaluation

Figure 1 shows an example of dialogue in condominium property recommendation. The user is looking for an apartment property that meets his/her requirements, and the system needs to help him/her to do so. The development of the dialogue is dynamically controlled by the controller. The content includes property conditions and preferences, background on the move, current problems and issues, budget, and payment options, which are also predefined topics. Due to space limitations, we are unable to present further examples of dialogue in this paper. Our system simulates many real consulting scenarios.

4.2 Evaluation by user simulator

In this section, we use evaluation by the user simulator itself as a method of dialogue evaluation. G-EVAL [2] proposed a method for evaluating natural language generation tasks using GPT-4, and it was shown that evaluation using GPT-4 is the closest to human evaluation compared to conventional methods. G-EVAL also introduces evaluation of dialogue generation tasks, which we apply to evaluate whether the entire dialogue is well controlled by the system. Specifically, we input the dialogue history as shown in 2and ask the user to score each of the following four items on a scale of 1 to 5.


• satisfaction


– Whether the user was satisfied with the dialogue


• flexibility


– Whether you were able to compose a tactful flow of dialogue based on the user’s statements


• accuracy


– Whether we were able to accurately identify user needs and organize the information


Figure 2: Example of user-assistant interaction in recommending condominium properties


Table 1: Evaluation by user simulator


• contradiction


– Whether you were able to successfully approach the user’s statement by pointing out the inconsistencies hidden in the user’s statement



Our system and the GPT-4 each interact with a user simulator and are evaluated on the above four items. Table 1 shows the comparative results of the average scores evaluated using the five-dialogue data.

4.3 Interactive control by controller

An example of the controller’s control of the assistant’s behavior is shown in 3. As can be seen from this example, the controller can take the questions generated by the assistant to the user, determine whether they are correct for the final purpose and not unnatural for the flow of the dialogue, and apply the necessary modifications. The modifications are not specific, such as giving an answer to the assistant, but are made at a level of abstraction that indicates only the policy, so that the content of the dialogue can maintain a certain degree of freedom, rather than a fixed development. This makes it possible to construct a more personalized dialogue that fits the user’s characteristics and is thought to have improved satisfaction.


Figure 3: Example of controller interactive control and assistant generation modification



This paper is available on arxiv under CC 4.0 license.