How ConstitutionMaker Utilizes LLMs for Chatbot Behavior Crafting

ConstitutionMaker is a web application and utilizes an LLM [3] that is promptable in the same way as GPT-3 [4] or PaLM [5]. In the following section, we go through the implementation of ConstitutionMaker’s key features.

5.1 Facilitating the Conversation

To generate the chatbot’s response, ConstitutionMaker builds a dialogue prompt (Figure 3A) behind the scenes. The dialogue prompt consists of (1) a description of the bot’s capabilities, entered by the user (Figure 1A), (2) the current set of principles, and (3) the conversation history, ending with the user’s latest input. The prompt then generates the bot’s next response, for which we choose the top-3 completions outputted by the LLM to display to users (Figure 3B). When the conversation is restarted or rewound, the conversation history within the dialogue prompt is modified; in the case of restarting, the entire history is deleted, whereas for rewinding, everything after the rewind point is deleted. And finally, if the conversation gets too long for the prompt context window, we remove the oldest conversational turns until it fits.

5.2 Three Principle Elicitation Features

All three principle elicitation features output a principle that is then incorporated back into the dialogue prompt (Figure 3A) to influence future conversational turns. Giving kudos and critiquing a bot’s response consist of a similar process. For both, the selected bot output is fed into a few-shot prompt that generates rationales, either positive (Figure 3C) or negative (Figure 3D). The user’s selected rationale (or their own written rationale) is then sent to a few-shot prompt that converts this rationale into a principle (Figure 3F and 3G). This few-shot prompt leverages the conversation history to create a specific, conditional principle. For example, for MusicBot, if the critique is “The bot did not ask questions about the user’s preferences,” a specific, conditional principle might be “Prior to giving a music recommendation, ask the user what genres or artists they currently listen to.” Next, for critiques, after the principle is inserted into the dialogue prompt, new outputs are generated to show to the user (Figure 3G). Finally, for rewriting the bot’s response, we leverage a chain-of-thought [38] style prompt that first generates a “thought,” which reasons how the original and rewritten outputs are different from each other, and then generates a specific principle based on that reasoning. Constructing the prompt with a “thought” portion led to principles that captured the difference between the two outputs better than our earlier versions without it.

[3] anonymized for peer review

[4] https://openai.com/api/

[5] https://developers.generativeai.google/

This paper is available on arxiv under CC 4.0 license.