Authors:
(1) Pham Hoang Van, Department of Economics, Baylor University Waco, TX, USA (Van Pham@baylor.edu);
(2) Scott Cunningham, Department of Economics, Baylor University Waco, TX, USA (Scott Cunningham@baylor.edu).
2 Direct vs Narrative Prediction
3 Prompting Methodology and Data Collection
4 Results
4.1 Establishing the Training Data Limit with Falsifications
4.2 Results of the 2022 Academy Awards Forecasts
5 Predicting Macroeconomic Variables
5.1 Predicting Inflation with an Economics Professor
5.2 Predicting Inflation with a Jerome Powell, Fed Chair
5.3 Predicting Inflation with Jerome Powell and Prompting with Russia’s Invasion of Ukraine
5.4 Predicting Unemployment with an Economics Professor
6 Conjecture on ChatGPT-4’s Predictive Abilities in Narrative Form
7 Conclusion and Acknowledgments
Appendix
A. Distribution of Predicted Academy Award Winners
B. Distribution of Predicted Macroeconomic Variables
Our research into ChatGPT-4’s predictive abilities reveals a striking dichotomy between direct prediction and future narrative-based prediction. Notably, in the realm of forecasting major Academy Awards categories, the model’s narrative predictions were remarkably accurate, except in the case of Best Picture. This may suggest that ChatGPT-4 excels in contexts where public opinion plays a significant role. The success of the future narrative exercise on macroeconomic phenomena was in some cases rather accurate, but seemingly important information shared could cause the estimates to paradoxically worsen. But in all cases, future narratives dramatically improve the predictive power of ChatGPT over simple prediction requests.
In the context of utilizing narrative prompts over direct prompts for enhancing the predictive capabilities of ChatGPT, particularly in forecasting the outcomes of the Academy Awards and macroeconomic variables, adherence to OpenAI’s terms of service becomes crucial in ensuring ethical application. While narrative prompts have shown to improve the model’s forecasting accuracy by leveraging its generative capabilities in a creative and story-driven manner, this method must be employed with a clear understanding of the potential implications for safety, well-being, and the rights of others. Specifically, the generation of predictions in sensitive areas like finance could inadvertently stray into providing financial advice or influencing high-stakes decisions. This underscores the importance of framing such predictive tasks within the realm of academic exploration or entertainment, rather than as actionable insights, thereby aligning with OpenAI’s directive against facilitating activities that could impair the well-being or rights of individuals.
Moreover, the distinction between narrative and direct prompts highlights an innovative approach to data analysis that respects the boundaries set by OpenAI’s terms of service. By focusing on the creative aspect of prediction, such as forecasting awards or economic trends, researchers and users navigate away from the direct application of AI in making high-stakes automated decisions or providing specialized advice without oversight from qualified professionals. This methodological choice not only enhances the integrity and ethical considerations of AI use but also promotes a responsible exploration of its capabilities. It serves as a reminder of the necessity to critically evaluate the application of AI tools within the framework of existing guidelines, ensuring that their use does not compromise the safety, rights, or well-being of individuals, thus adhering to the principles outlined by OpenAI in safeguarding against the misuse of its technology in sensitive domains.
Another explanation, though, is that there is something intrinsic to the narrative prompting that allows the Transformer architecture to make more accurate predictions even outside of the confounding set by OpenAI’s terms of service. This may be related to how the hallucination fabricrations work within the machine learning environment of attention mechanisms. But as we only studied the two OpenAI GPT models, we are unable to provide more than just speculation as these terms of use violations are always present if that is indeed the case.
The observed discrepancy in GPT-4’s predictive capabilities, depending on the use of direct versus narrative prompts, suggests a nuanced interplay between the model’s creative freedoms and its adherence to ethical guidelines. Narrative prompting, by weaving future events into fictional stories, appears to bypass certain constraints designed to align GPT-4’s outputs with OpenAI’s ethical guidelines, particularly those intended to prevent the generation of speculative, high-stakes predictions like those in financial or medical domains. This method capitalizes on the model’s capacity for creativity, indirectly accessing its sophisticated predictive capabilities even in areas where direct forecasting might breach terms of service due to ethical considerations.
This phenomenon underscores a potential challenge in enforcing AI ethical guidelines while maintaining the versatility and utility of LLMs. The creative latitude allowed by narrative prompts may enable users to elicit sensitive or speculative information under the guise of fictional storytelling, raising questions about the boundaries of responsible AI use. As OpenAI continues to encourage and refine the creative abilities of its models, understanding and addressing the implications of narrative versus direct prompting in the context of ethical AI usage becomes crucial. This situation highlights the need for ongoing research and dialogue to balance the innovative potential of LLMs with the ethical imperatives guiding their development and deployment.
The authors would like to thank the research assistants, Erin Harwell, Brayden Kowalski, Connor Hornsby, Killian Karvois, for their valuable contributions to this study and OpenAI for developing the GPT models used in our research.
This paper is available on arxiv under CC BY 4.0 DEED license.