2 Survey with Industry Professionals
3 RQ1: Real-World use cases that necessitate output constraints
4.2 Integrating with Downstream Processes and Workflows
4.3 Satisfying UI and Product Requirements and 4.4 Improving User Experience, Trust, and Adoption
5.2 The Case for NL: More Intuitive and Expressive for Complex Constraints
6 The Constraint maker Tool and 6.1 Iterative Design and User Feedback
Large language models can produce creative and diverse responses. However, to integrate them into current developer workflows, it is essential to constrain their outputs to follow specific formats or standards. In this work, we surveyed 51 experienced industry professionals to understand the range of scenarios and motivations driving the need for output constraints from a user-centered perspective. We identified 134 concrete use cases for constraints at two levels: low-level, which ensures the output adhere to a structured format and an appropriate length, and high-level, which requires the output to follow semantic and stylistic guidelines without hallucination. Critically, applying output constraints could not only streamline the currently repetitive process of developing, testing, and integrating LLM prompts for developers, but also enhance the user experience of LLM-powered features and applications. We conclude with a discussion on user preferences and needs towards articulating intended constraints for LLMs, alongside an initial design for a constraint prototyping tool.
Over the past few years, we have witnessed the extraordinary capability of Large Language Models (LLMs) to generate responses that are not only creative and diverse but also highly adaptable to various user needs [5, 7, 18, 21, 22, 29, 31, 32]. For example, researchers can prompt ChatGPT [25] to condense long articles into concise summaries for fast digestion; while video game developers can generate detailed character profiles with rich personality traits, backstories, and unique abilities on demand, simply by dynamically prompting an LLM with the game context and players’ preferences.
As much as end-users appreciate the unbounded creativity of LLMs, recent field studies examining the development of LLMpowered applications have repeatedly demonstrated the necessity to impose constraints on LLM outputs [10, 30]. For instance, a user might require a summary of an article to be “strictly less than 20 words” to meet length constraints, or a generated video game character profile to be “a valid JSON that can be parsed by Python” for a development pipeline.
However, as evidenced by many recent NLP benchmarks and evaluations [16, 36, 42, 43], current state-of-the-art LLMs still lack the ability to guarantee that the generated output will invariably conform to user-defined constraints in the prompt (sometimes referred to as controllability). Although researchers have proposed various methods to improve controllability, such as supervised finetuning with specialized datasets [35] or controlled decoding strategies [1, 4, 24, 40], they tend to only focus on addressing a narrow range of constraints without taking into consideration the diverse usage scenarios and rationale that real-world developers and end-users encounter when prototyping and building practical LLM-powered functionalities and applications [12–14, 23].
In this work, we took the first step to systematically investigate the scenarios and motivations for applying output constraints from a user-centered perspective. Specifically, we sought to understand:
• RQ1: What real-world use cases would necessitate or benefit from being able to constrain LLM outputs?
• RQ2: What are the benefits and impacts of being able to apply constraints to LLM outputs?
• RQ3: How would users like to articulate their intended constraints to LLMs?
We investigated these research questions by distributing a survey to a broad population of industry professionals (software engineers, researchers, designers, project managers, etc.) who have experience building LLM-powered applications. Our analysis identified six primary categories of output constraints that users desire, each supported by detailed usage scenarios and illustrative examples, summarized in Table 1. In a nutshell, users not only need low-level constraints, which mandate the output to conform to a structured format and an appropriate length, but also desire high-level constraints, which involve semantic and stylistic guidelines that users would like the model output to adhere to without hallucinating. Notably, developers often have to write complex code to handle ill-formed LLM outputs, a chore that could be simplified or eliminated if LLMs could strictly follow output constraints. In addition, the ability to apply constraints could help ease the integration of LLMs with existing pipelines, meet UI and product specifications, and enhance user trust and experience with LLM-powered features. Moreover, we discovered that describing constraints in natural language (NL) within prompts is not always the preferred method of control for LLM users. Instead, they seek alternative mechanisms, such as using graphical user interfaces, or GUIs, to define and test constraints, which could offer greater flexibility and a heightened sense of assurance that the constraints will be strictly followed.
Informed by these results, we present an early design of ConstraintMaker, a prototype tool that enables LLM users to experiment, test, and apply constraints on the format of LLM outputs (see Figure 2 for more details), along with feedback and insights from preliminary user tests. Overall, this paper contributes:
• a comprehensive taxonomy summarizing both low-level and high-level output constraints desired by LLM users (Table 1), derived from 134 real-world use cases reported by our survey respondents (RQ1),
• an overview of both developer and user-facing benefits of being able to impose constraints on LLM outputs (RQ2),
• an exploration of LLM users’ preferences for expressing constraints, whether via GUIs or natural language (RQ3),
• a initial design of the tool ConstraintMaker, which enables users to visually prototype LLM output constraints, accompanied by a discussion of preliminary user feedback.
This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
Authors:
(1) Michael Xieyang Liu, Google Research, Pittsburgh, PA, USA (lxieyang@google.com);
(2) Frederick Liu, Google Research, Seattle, Washington, USA (frederickliu@google.com);
(3) Alexander J. Fiannaca, Google Research, Seattle, Washington, USA (afiannaca@google.com);
(4) Terry Koo, Google, Indiana, USA (terrykoo@google.com);
(5) Lucas Dixon, Google Research, Paris, France (ldixon@google.com);
(6) Michael Terry, Google Research, Cambridge, Massachusetts, USA (michaelterry@google.com);
(7) Carrie J. Cai, Google Research, Mountain View, California, USA (cjcai@google.com).