Author:
(1) David M. Markowitz, Department of Communication, Michigan State University, East Lansing, MI 48824. Editor's note: This is part 7 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below. Table of Links Abstract
The Benefits of Simple Writing
The Current Work
Study 1a: Method
Study 1a: Results
Study 1b: Method
Study 1b: Results
Study 2: Method
Study 2: Results
General Discussion, Acknowledgements, and References Study 1b: Results Distributions of the comparisons in this study are reflected in Figure 1. Indeed, GPT significance statements were written in a simpler manner than PNAS significance statements for the simplicity index, Welch’s t(1492.1) = 11.55, p < .001, Cohen’s d = 0.58, 95% CI [0.47, 0.69]. Specifically, GPT significance statements (M = 75.53%, SD = 5.57%) contained more common words than PNAS significance statements (M = 69.84%, SD = 7.45%), Welch’s t(1478.7) = 17.31, p < .001, Cohen’s d = 0.87, 95% CI [0.76, 0.97]. GPT significance statements (M = 17.59, SD = 11.15) were also more readable than PNAS significance statements (M = 12.86, SD = 14.27), Welch’s t(1510) = 7.39, p < .001, Cohen’s d = 0.37, 95% CI [0.27, 0.47]. However, GPT significance statements (M = 92.73, SD = 6.89) had a statistically equivalent analytic style as PNAS significance statements (M = 92.32, SD = 7.48), Welch’s t(1587.7) = 1.16, p = .246, Cohen’s d = 0.06, 95% CI [-0.04, 0.16]. All results were maintained when comparing GPT significance statements to PNAS abstracts and PNAS significance statements as well. Alternative Explanations One possible explanation for the Study 1b results is that there are content differences across the PNAS and GPT texts explaining or impacting such differences across groups. This concern was addressed in two ways. First, PNAS has various sections that authors submit to, and LIWC has categories to approximate words associated with such sections. For example, the LIWC category for political speech would approximate papers submitted the Social Science section, specifically Political Sciences. Several linguistic covariates were therefore examined to account for content-related differences across GPT and PNAS texts. After including overall affect/emotion and cognition (to control for topics within the Psychological Sciences section of PNAS), political speech (to control for topics within the Political Science section of PNAS), and physical references to the multivariate models (to control for topics within the Biological Sciences section of PNAS), all results were maintained except for Analytic writing, where GPT texts were more analytic than PNAS texts, which is also consistent with prior work (42). Please see the online supplement for additional LIWC differences across these text types. Content effects were also evaluated in a bottom-up manner using the Meaning Extraction Method to measure dominant themes across the GPT and PNAS texts (43, 44). The evidence in the online supplement states there were 8 themes reliably extracted from the data, ranging from basic methodological and research information to gene expression and cancer science. Controlling for these themes, including the prior LIWC content dimensions, revealed consistent results as well (see supplement). Therefore, Study 1b evidence is robust to content. Altogether, human authors write simpler for lay audiences than for scientific audiences (Study 1a), but Study 1b demonstrated artificial intelligence and large language models can do so more effectively (e.g., the effect size differences between GPT significance statements and PNAS significance statements was larger than human in Study 1a). The findings thus far are correlational and therefore need causal evidence to demonstrate the impact of these effects on human perceptions. In Study 2, participants were randomly assigned to read a GPT significance statement or PNAS significance statement from pairs of texts that appeared in the previous studies. Participants made perceptions about the author (e.g., intelligence, credibility, trustworthiness), judged the complexity of each text, and they rated how much they believed the author of each text was human or artificial intelligence. Only perceptions of the author were made because prior work suggests people generally report consistent ratings when asked about both scientists and their science in similar studies (9). This paper is available on arxiv under CC BY 4.0 DEED license. Author: (1) David M. Markowitz, Department of Communication, Michigan State University, East Lansing, MI 48824. Author: Author: (1) David M. Markowitz, Department of Communication, Michigan State University, East Lansing, MI 48824. Editor's note: This is part 7 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below. Editor's note: This is part 7 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below. Editor's note: This is part 7 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below. Table of Links Abstract The Benefits of Simple Writing The Current Work Study 1a: Method Study 1a: Results Study 1b: Method Study 1b: Results Study 2: Method Study 2: Results General Discussion, Acknowledgements, and References Abstract Abstract The Benefits of Simple Writing The Benefits of Simple Writing The Current Work The Current Work Study 1a: Method Study 1a: Method Study 1a: Results Study 1a: Results Study 1b: Method Study 1b: Method Study 1b: Results Study 1b: Results Study 2: Method Study 2: Method Study 2: Results Study 2: Results General Discussion, Acknowledgements, and References General Discussion, Acknowledgements, and References Study 1b: Results Distributions of the comparisons in this study are reflected in Figure 1. Indeed, GPT significance statements were written in a simpler manner than PNAS significance statements for the simplicity index, Welch’s t(1492.1) = 11.55, p < .001, Cohen’s d = 0.58, 95% CI [0.47, 0.69]. Specifically, GPT significance statements (M = 75.53%, SD = 5.57%) contained more common words than PNAS significance statements (M = 69.84%, SD = 7.45%), Welch’s t(1478.7) = 17.31, p < .001, Cohen’s d = 0.87, 95% CI [0.76, 0.97]. GPT significance statements (M = 17.59, SD = 11.15) were also more readable than PNAS significance statements (M = 12.86, SD = 14.27), Welch’s t(1510) = 7.39, p < .001, Cohen’s d = 0.37, 95% CI [0.27, 0.47]. However, GPT significance statements (M = 92.73, SD = 6.89) had a statistically equivalent analytic style as PNAS significance statements (M = 92.32, SD = 7.48), Welch’s t(1587.7) = 1.16, p = .246, Cohen’s d = 0.06, 95% CI [-0.04, 0.16]. All results were maintained when comparing GPT significance statements to PNAS abstracts and PNAS significance statements as well. Alternative Explanations Alternative Explanations One possible explanation for the Study 1b results is that there are content differences across the PNAS and GPT texts explaining or impacting such differences across groups. This concern was addressed in two ways. First, PNAS has various sections that authors submit to, and LIWC has categories to approximate words associated with such sections. For example, the LIWC category for political speech would approximate papers submitted the Social Science section, specifically Political Sciences. Several linguistic covariates were therefore examined to account for content-related differences across GPT and PNAS texts. After including overall affect/emotion and cognition (to control for topics within the Psychological Sciences section of PNAS), political speech (to control for topics within the Political Science section of PNAS), and physical references to the multivariate models (to control for topics within the Biological Sciences section of PNAS), all results were maintained except for Analytic writing, where GPT texts were more analytic than PNAS texts, which is also consistent with prior work (42). Please see the online supplement for additional LIWC differences across these text types. Content effects were also evaluated in a bottom-up manner using the Meaning Extraction Method to measure dominant themes across the GPT and PNAS texts (43, 44). The evidence in the online supplement states there were 8 themes reliably extracted from the data, ranging from basic methodological and research information to gene expression and cancer science. Controlling for these themes, including the prior LIWC content dimensions, revealed consistent results as well (see supplement). Therefore, Study 1b evidence is robust to content. Altogether, human authors write simpler for lay audiences than for scientific audiences (Study 1a), but Study 1b demonstrated artificial intelligence and large language models can do so more effectively (e.g., the effect size differences between GPT significance statements and PNAS significance statements was larger than human in Study 1a). The findings thus far are correlational and therefore need causal evidence to demonstrate the impact of these effects on human perceptions. In Study 2, participants were randomly assigned to read a GPT significance statement or PNAS significance statement from pairs of texts that appeared in the previous studies. Participants made perceptions about the author (e.g., intelligence, credibility, trustworthiness), judged the complexity of each text, and they rated how much they believed the author of each text was human or artificial intelligence. Only perceptions of the author were made because prior work suggests people generally report consistent ratings when asked about both scientists and their science in similar studies (9). This paper is available on arxiv under CC BY 4.0 DEED license. This paper is available on arxiv under CC BY 4.0 DEED license. available on arxiv

Part of HackerNoon's growing list of open-source research papers, promoting free access to academic material.

AI Crushes It at Simplicity: GPT-4 Writes Science Summaries Better Than the Pros

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

A New Model That Actually Works: Why Today’s AI Detectors Fail

Study Finds Generative AI Appears Less Intelligent Yet More Credible Than Humans in Science Writing

How Simple Writing Enhances the Public's Trust in Science

Humans vs. AI: Who Writes Simpler Science—and Who Gets the Blame for Complexity?

Inside the Numbers: How 34,584 Science Papers Reveal the Secrets of Simpler Writing

Science Summaries Are Simpler, but Not by Much—Can AI Do Better?

A New Model That Actually Works: Why Today’s AI Detectors Fail

Study Finds Generative AI Appears Less Intelligent Yet More Credible Than Humans in Science Writing

How Simple Writing Enhances the Public's Trust in Science

Humans vs. AI: Who Writes Simpler Science—and Who Gets the Blame for Complexity?

Inside the Numbers: How 34,584 Science Papers Reveal the Secrets of Simpler Writing

Science Summaries Are Simpler, but Not by Much—Can AI Do Better?

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps