paint-brush
Real-World Use Cases That Necessitate Output Constraintsby@structuring
New Story

Real-World Use Cases That Necessitate Output Constraints

by StructuringMarch 19th, 2025
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Below, we discuss a number of interesting insights that emerged from our analysis of the use cases

People Mentioned

Mention Thumbnail

Companies Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Real-World Use Cases That Necessitate Output Constraints
Structuring HackerNoon profile picture
0-item

Abstract and 1 Introduction

2 Survey with Industry Professionals

3 RQ1: Real-World use cases that necessitate output constraints

4 RQ2: Benefits of Applying Constraints to LLM Outputs and 4.1 Increasing Prompt-based development Efficiency

4.2 Integrating with Downstream Processes and Workflows

4.3 Satisfying UI and Product Requirements and 4.4 Improving User Experience, Trust, and Adoption

5 How to Articulate output constraints to LLMS and 5.1 The case for GUI: A Quick, Reliable, and Flexible Way of Prototyping Constraints

5.2 The Case for NL: More Intuitive and Expressive for Complex Constraints

6 The Constraint maker Tool and 6.1 Iterative Design and User Feedback

7 Conclusion and References

A. The Survey Instrument

3 RQ1: REAL-WORLD USE CASES THAT NECESSITATE OUTPUT CONSTRAINTS

Table 1 presents a taxonomy of six primary categories of use cases that require output constraints, each with representative real-world examples and quotes submitted by our respondents.


These can be further divided into low-level and high-level constraints — low-level constraints ensure that model outputs adhere to a specific structure (e.g., JSON or markdown), instruct the model to perform pure multiple choices (e.g., sentiment classification), or dictate the length of the outputs; whereas high-level constraints enforce model outputs to respect semantic (e.g., must include or avoid specific terms or actions) or stylistic (e.g., follow certain style or tone) guidelines, while preventing hallucination.


Below, we discuss a number of interesting insights that emerged from our analysis of the use cases:


• Going beyond valid JSON. Note that recent advancements in instruction-tuning techniques have substantially improved the the chances of generating a valid JSON object upon user request [27, 29]. Nonetheless, our survey respondents believed that this was not enough and desired to have more precise control over the JSON schema (i.e., key/value pairs). One respondent stated their expectation as follows: “I expect the quiz [that the LLM makes given a few passages provided below] to have 1 correct answer and 3 incorrect ones. I want to have the output to be like a json with keys {"question": "...", "correct_answer": "...", "incorrect_answers": [...]}.” It is also worth mentioning that some respondents found that “few-shot prompts” — demonstrating the desired key/value pairs with several examples — tend to work “fairly well”. However, they concurred that having a formal guarantee of JSON schema would be greatly appreciated (see section 4.1 for their detailed rationales).


Table 2: Respondents’ perceived benefits of having the ability to apply constraints to LLM output (RQ2).


• Giving an answer without extra conversational prose. When asking an LLM to perform data classification or labeling, such as “[classifying sentiments as] Positive, Negative, Neutral, etc.,” respondents typically expect the model to only output the classification result (e.g. “Positive.”) without a trailing “explanation” (e.g., “Positive, since it referred to the movie as a ‘timeless masterpiece’...”), as the addition of explanation could potentially confuse the downstream parsing logic. This indicates a potential misalignment between a common training objective — where LLMs are often tailored to be conversational and provide rich details [2, 17, 33] — and certain specialized downstream use cases where software developers need LLMs to be succinct. Such use cases necessitate output constraints that are independent of the prompt that would help adapt a general-purpose model to meet specific user requirements.


• Conditioning the output on the input, but don’t “improvise!” One thread of high-level constraints places emphasis on directing the model to condition its output on specific content from the input. For example, the model’s response should semantically remain “in the same ballpark” as “the user’s original query” — “[the output of] a query about ‘fall jackets’ should be confined to clothing.” A particular instance of this is for LLMs to echo segments of the input in their output, occasionally with slight alterations. For example, “we want LLM to repeat input with some control tokens to indicate the mentions. e.g. input: ‘Obama was born in 1961.’,... , we want output to be ‘«Obama» was born in 1961.’” Nevertheless, respondents underscored the importance of the model not improvising beyond its input and instructions. For example, one respondent instructed an LLM to “annotate a method with debug statement,” anticipating the output would “ONLY include changes that add print statements to the method.” However, the LLM would frequently introduce additional “changes in syntax” that were unwarranted.


This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.

Authors:

(1) Michael Xieyang Liu, Google Research, Pittsburgh, PA, USA (lxieyang@google.com);

(2) Frederick Liu, Google Research, Seattle, Washington, USA (frederickliu@google.com);

(3) Alexander J. Fiannaca, Google Research, Seattle, Washington, USA (afiannaca@google.com);

(4) Terry Koo, Google, Indiana, USA (terrykoo@google.com);

(5) Lucas Dixon, Google Research, Paris, France (ldixon@google.com);

(6) Michael Terry, Google Research, Cambridge, Massachusetts, USA (michaelterry@google.com);

(7) Carrie J. Cai, Google Research, Mountain View, California, USA (cjcai@google.com).