Authors: (1) Herbie Bradley, CarperAI, CAML Lab, University of Cambridge & EleutherAI; (2) Andrew Dai, Aleph Alpha; (3) Hannah Teufel, Aleph Alpha; (4) Jenny Zhang, 5Department of Computer Science, University of British Columbia & Vector Institute; (5) Koen Oostermeijer, Aleph Alpha; (6) Marco Bellagente, Stability AI; (7) Jeff Clune, Department of Computer Science, University of British Columbia, Vector Institute & Canada CIFAR AI Chair; (8) Kenneth Stanley, Maven; (9) Grégory Schott, Aleph Alpha; (10) Joel Lehman, Stochastic Labs. Table of Links Abstract & Introduction Background and Related Work Approach Experiments on Creative Writing Domain Discussion and Conclusion A Appendix ABSTRACT In many text-generation problems, users may prefer not only a single response, but a diverse range of high-quality outputs from which to choose. Quality-diversity (QD) search algorithms aim at such outcomes, by continually improving and diversifying a population of candidates. However, the applicability of QD to qualitative domains, like creative writing, has been limited by the difficulty of algorithmically specifying measures of quality and diversity. Interestingly, recent developments in language models (LMs) have enabled guiding search through AI feedback, wherein LMs are prompted in natural language to evaluate qualitative aspects of text. Leveraging this development, we introduce Quality-Diversity through AI Feedback (QDAIF), wherein an evolutionary algorithm applies LMs to both generate variation and evaluate the quality and diversity of candidate text. When assessed on creative writing domains, QDAIF covers more of a specified search space with high-quality samples than do non-QD controls. Further, human evaluation of QDAIF-generated creative texts validates reasonable agreement between AI and human evaluation. Our results thus highlight the potential of AI feedback to guide open-ended search for creative and original solutions, providing a recipe that seemingly generalizes to many domains and modalities. In this way, QDAIF is a step towards AI systems that can independently search, diversify, evaluate, and improve, which are among the core skills underlying human society’s capacity for innovation.[1] 1 INTRODUCTION Human innovation is not only a generative capacity for creativity, but also includes the ability to evaluate the subjective quality of new ideas and artifacts. Great ideas are rarely generated all at once out of whole cloth, but rather gradually emerge through divergent chains of elaboration and revision (Stanley & Lehman, 2015). To successfully navigate such a tree of ideas, the creator must evaluate which steps in a chain are worth pursuing further, a question that can be highly subjective, especially in domains with artistic or literary dimensions. Until now, even if AI could provide candidates, the hope for such subjectively tinged evaluation lay firmly with humans. However, the emerging foundation model technology of recent years (Bommasani et al., 2021) now means that the model can also play the role of evaluator, even when the evaluation is in part subjective (Madaan et al., 2023). In this way, for the first time, an entire ideation process that returns a diverse set of interesting artifacts can in principle be automated. This process cannot be run by LMs entirely on their own, but requires chaining together a search algorithm with model calls in a nuanced way. This paper highlights one way to achieve this potential: to combine LMs with the field of quality-diversity (QD) (Mouret & Clune, 2015), which centers on how to design search processes that produce high-quality solutions that span a design space. The main insight in QD algorithms is to explicitly maintain and seek high-quality diverse responses. Typically such search algorithms require hand-designed measures of diversity and quality, as well as a way to generate meaningful variation. Yet the most interesting and complex domains nearly always involve notions of performance, diversity, and variation that are subjective or difficult to specify algorithmically. Extending work that generates variation through LMs (Lehman et al., 2022; Meyerson et al., 2023) and evaluates the quality of potential solutions through LMs (Ahn et al., 2022), we show that LMs can also be used to evaluate qualitative aspects of diversity. In this way, LMs can instantiate the three main ingredients of QD search, thereby enabling powerful new QD algorithms that can ride the coattails of continual LM advances, which we name Quality-Diversity through AI Feedback (QDAIF). Such QDAIF can explore and return diverse, high-quality responses to an LM prompt through more-intuitive diversity measures, without the need for model fine-tuning (although, it could also be used for LMs to self-improve by generating fine-tuning data (Lehman et al., 2022; Chen et al., 2023)), an interesting direction for self-curated effective learning environments via generated data, towards AI-generating algorithms (Clune, 2019)). We evaluate QDAIF across three creative writing domains: opinion writing, short stories, and poetry. The idea is that in such creative domains, users often enjoy seeing a wide range of possible stories or poems from which to choose or be inspired by. Quantitative results indicate that QDAIF significantly outperforms existing baselines. Additionally, through human evaluation, we observe a strong alignment between human and AI-generated feedback, providing empirical evidence that AI feedback is grounded and that the method can work in practice (i.e. it yields improved quality and diversity as measured by humans). Overall, QDAIF brings us a step closer to AI models that can independently search and innovate, one of the keystone abilities of humans that allow them to create culture and science (Stanley et al., 2017). [1] Project Page: https://qdaif.github.io/ This paper is a under CC 4.0 license. vailable on arxiv

Quality-Diversity through AI Feedback: Background and Related Work

Quality-Diversity through AI Feedback (QDAIF): A Step towards Autonomous AI Creativity

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A Detailed Analysis of Inter-Annotator Agreement

The Scary Advancements of AI in Creative Fields

Sora's AI Innovation Set to Disrupt Video Content Creation

Here's How to Prove AI Is Not Creative

AI is Making Waves in Art, Music, and Storytelling

You Can't Compete With AI—Unless You Create What No One Else Can

A Detailed Analysis of Inter-Annotator Agreement

The Scary Advancements of AI in Creative Fields

Sora's AI Innovation Set to Disrupt Video Content Creation

Here's How to Prove AI Is Not Creative

AI is Making Waves in Art, Music, and Storytelling

You Can't Compete With AI—Unless You Create What No One Else Can

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps