paint-brush
Advancing Coherent Story Generation: Future Directions for AI Writing Toolsby@teleplay
342 reads
342 reads

Advancing Coherent Story Generation: Future Directions for AI Writing Tools

by Teleplay Technology May 21st, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

Future work aims to improve story coherence in AI writing tools while considering ethical implications. Discussions include critiques of hierarchical approaches in film and theatre writing and reflections on authorship and collaboration in AI co-creative systems.
featured image - Advancing Coherent Story Generation: Future Directions for AI Writing Tools
Teleplay Technology  HackerNoon profile picture

Authors:

(1) PIOTR MIROWSKI and KORY W. MATHEWSON, DeepMind, United Kingdom and Both authors contributed equally to this research;

(2) JAYLEN PITTMAN, Stanford University, USA and Work done while at DeepMind;

(3) RICHARD EVANS, DeepMind, United Kingdom.

Abstract and Intro

Storytelling, The Shape of Stories, and Log Lines

The Use of Large Language Models for Creative Text Generation

Evaluating Text Generated by Large Language Models

Participant Interviews

Participant Surveys

Discussion and Future Work

Conclusions, Acknowledgements, and References

A. RELATED WORK ON AUTOMATED STORY GENERATION AND CONTROLLABLE STORY GENERATION

B. ADDITIONAL DISCUSSION FROM PLAYS BY BOTS CREATIVE TEAM

C. DETAILS OF QUANTITATIVE OBSERVATIONS

D. SUPPLEMENTARY FIGURES

E. FULL PROMPT PREFIXES FOR DRAMATRON

F. RAW OUTPUT GENERATED BY DRAMATRON

G. CO-WRITTEN SCRIPTS

7 DISCUSSION AND FUTURE WORK

7.1 Future Work Towards Coherent Story Generation

While common sense and logical consistency is an elusive goal for LLMs [7], their utility as a writing tool increases as they generate more coherent outputs. Hierarchical generation can be leveraged in various ways to improve the long-range coherence of long texts. One enhancement to improve coherence, especially suited for screenplays and theatre scripts, is a method to generate complex and complete character arcs. Likewise, generating satisfying scene conclusions in dialogue is rare. One technique to address this is to use hierarchical dialogue generation by constructing each scene’s dialogue from a generated beginning, middle, and end dialogue beat. Finally, to improve stylistic coherence, future work could explore methods to generate thematic outputs satisfying notions of genre. This could be done by writing new prompts or transposing existing ones into a variety of diverse author styles and voices.

7.2 On the Difference between Film and Theatre, and a Critique of the Formulaic Hierarchy in Dramatron

As Dramatron relies on top-down hierarchical generation, each subsequent step depends on those that came prior. Several participants noted that this style of writing is more aligned with screenwriting than playwriting. In the words of some industry professionals, they were “drawing a line between screenwriting and script writing for stage. Playwrights do not use the log line in the same way as in screen writing” (p9). Participants 4 and 5, who self-identified as playwrights for the stage, “would never approach a piece of work with a story in [their] head. [They] might come with a more investigative approach with a theme”. In fact, they argued against Dramatron’s constrained top-down hierarchy: “Why generate characters first? At earlier parts of the creation process you might not know the characters. In the way [we] work, [we] often come with characters at the end, and the idea of characters comes last”. The log line itself could be seen as a post-hoc summary, as “often times playwrights find out what the play is about after [they] finish it. I will know the play once it is done” (p9). This said, Dramatron does allow for going back-and-forth between levels in the hierarchy, and through creative prompting, future work could allow for generation steps can happen out of order.


Differences in the writing of screenplays and theatre scripts can also be related to cultural and economic factors: “The difference between theatre and screen is that nobody’s making theatre on demand, no outside pressure. Or at least not in the same way that there is for television. Theatre feels less like generating content” (p4, p5) whereas “film scripts, in the industry, want the traditional fourth wall” (p9). This reflection invites us to reconsider the applicability of Dramatron. Since our tool is formulaic by construction, is it suitable to TV or film production? As one respondent noted, “[Dramatron] will be very useful for an average Hollywood movie and for television. It does not need to have a deep understanding of the human soul, unlike Shakespeare. [...] The thing with action movies is that is that actors are not expected to connect with the writer. A screenwriter on a TV set is just like [Dramatron] [...]. It is a sublime skill to be a Hollywood writer because your creative input is small.” (p9).

7.3 Ethical Questions and Risks Posed by AI Writing Tools

We describe a co-creative tool built around large language models. It can augment and uplift human artists’ work by providing them with inspiration, as well as challenge them and thereby support their artistic practice. Before conducting our study, we identified three directly relevant risks and ethical implications discussed in previous work [116]: 1) bias and offensive language in the generated output, 2) automation of creative work resulting in “cannibalizing” the work of creative artists engaged in script writing, and 3) copyright infringement by reusing copyrighted data from the training dataset, either knowingly (e.g. through prompting: “write the script in the style of Ursula Le Guin”) or unknowingly (e.g. by virtue of similar training data). Our mitigation strategy is two-fold: we invite the creative human artist into the loop throughout the co-authorship process, and we maintain clarity and transparency on the origin of the generated text.


To mitigate copyright issues, the writer could query short parts of the script using a search engine and plagiarism detection tools [59]; this functionality could be built directly into co-creative tools. Writers using these tools should be aware of the origin of the data in the LLM, and their audiences should be aware that those outputs were generated through an interaction between humans and co-creative tools. Interestingly, study participants independently raised these concerns during interviews.


From the feedback gathered in the study, some participants reported that outputs from the LLM can sometimes be problematic, stereotypical, or biased: for example, “I am less sexist than the computer” (p3), or “the protagonists are both male characters, and all of the supporting characters are female” (p4, p5). Furthermore, participants raised concerns about the source of the dataset: “If you are putting existing scripts into the dataset, where are they being pulled from?” (p4, p5). Thoughts on this subject ranged from “Plagiarising the corpus of scripts is a problem” (p2) to “In the context of collective and devised creation, [reusing existing published work] is not necessarily a problem, because it can be perceived as an homage to existing work” (p11). The rules and norms for the use of systems trained on copyright-protected material are the subject of ongoing work [9]. For example, Lee et al. (2022) distinguish between verbatim, paraphrase, and idea plagiarism [59]. Finally, participants raised concern about the potential impact of generative tools on creative economies: “It would free the artist from writing formulaic scripts, [but] it also replaces the work opportunities” (p4, p5). In general, participants found our mitigation strategies satisfactory and none reported distress or concern regarding outputs from the model. While not the prime focus of the interview sessions, biases and stereotypes could be systematically explored: future work could explore what sorts of narratives can be written using using AI tools, and how the system performs for different cultural groups.

7.4 Using a Tool or Participating in a Co-Creative System?

Previous work has argued for engagement with subject matter experts, literary scholars, and industry professionals [109]. In this work, screenwriters and playwrights co-wrote with Dramatron. In the post-interview surveys, most of the participants felt they did not own the final output. This raises several questions: Should Dramatron be considered merely a writing tool, or should it rather be seen as a co-creative system? Are writers comfortable using and ready to adopt co-creative tools? Participants reflected on the authorship of machine co-created text (“Interesting what this means for the future of writing. Who is the author?”, p6). As a corollary to the issues of authorship and of biases, p3 wondered whether an LLM should generate text from the perspectives of different identities, and if that could amount to cultural appropriation (although they later added: “but I write about many people too, and I am less objective than this AI, because I have seen less data”). Chung et al. [20] discuss how AI-based Creativity Support Tools can be seen as part of the Artist Support Network. These tools may need to conform to the artists’ expectations of collaboration, similarly to the types of interactions they have with human collaborators: for example, sub-contracting, co-creation, or inspiration. Our expert interviews and surveys surfaced similar views towards the interaction with Dramatron and co-creative writing tools in general.



This paper is available on arxiv under CC 4.0 license.