paint-brush
Science Summaries Are Simpler, but Not by Much—Can AI Do Better?by@textgeneration

Science Summaries Are Simpler, but Not by Much—Can AI Do Better?

by Text GenerationNovember 26th, 2024
Read on Terminal Reader
Read this story w/o Javascript
tldt arrow

Too Long; Didn't Read

A study from Michigan State University evaluated the effectiveness of using generative AI to simplify science communication and enhance public trust in science.
featured image - Science Summaries Are Simpler, but Not by Much—Can AI Do Better?
Text Generation HackerNoon profile picture

Author:

(1) David M. Markowitz, Department of Communication, Michigan State University, East Lansing, MI 48824.

Editor's note: This is part 5 of 10 of a paper evaluating the effectiveness of using generative AI to simplify science communication and enhance public trust in science. The rest of the paper can be accessed via the table of links below.

Study 1a: Results

Descriptive statistics for each language dimension and intercorrelations are in Table 1. As expected, lay summaries were linguistically simpler than scientific summaries of the same article, Welch’s t(65793) = 40.62, p < .001, Cohen’s d = 0.31, 95% CI [0.29, 0.32].[2] At the item level of the simplicity index, lay summaries (M = 69.77%, SD = 7.14%) contained more common words than scientific summaries (M = 67.79%, SD = 6.60%), Welch’s t(68741) = 37.79, p < .001, Cohen’s d = 0.29, 95% CI [0.27, 0.30]. Lay summaries (M = 92.34, SD = 7.95) also had a simpler linguistic style than scientific summaries (M = 94.31, SD = 5.19), Welch’s t(59561) = - 38.52, p < .001, Cohen’s d = 0.29, 95% CI [0.28, 0.31]. Finally, lay summaries (M = 12.96, SD = 13.93) were more readable than scientific summaries as well (M = 12.49, SD = 12.46), Welch’s t(68320) = 4.67, p < .001, Cohen’s d = 0.036, 95% CI [0.02, 0.05].


Together, while lay summaries were indeed linguistically simpler than scientific summaries at PNAS, the effect sizes between such groups were quite small and it is therefore unclear if general readers would be able to recognize or appreciate such differences. Can lay summaries be written even simpler, using generative AI tools, to produce more substantive effect sizes while maintaining the core content of each text? In the next study, a random selection of abstracts was submitted to a popular large language model, GPT-4, and were given the same instructions as PNAS authors on how to construct a significance statement.


This paper is available on arxiv under CC BY 4.0 DEED license.


[2] 95% Confidence Intervals were bootstrapped with 5,000 replicates.