Millions of people use Large Language Models (LLMs) to research information about complex topics related to societal issues. As a result, LLMs might be influencing large worldwide audiences in ways that remain unexplored with empirical data. To address this data gap, this study introduces and analyses SociaLLMisinformation: a dataset of 33,000 English and Italian LLM-generated texts on societal issues like climate change, global warming and health misinformation. Texts were mined from OpenAI's GPT 3.5 and GPT 4o, Meta's Llama 3 and Llama 3.1, Anthropic's Claude 3's Haiku, Mistral and LLaMAntino. We investigate LLMs' framings in regard to these societal topics, through an interpretable computational framework based on textual forma mentis networks (TFMNs), i.e., networks of syntactic/semantic associations between concepts in texts. Using TFMNs, we extract LLMs' linguistic and affective biases present in the SociaLLMisinformation texts. Our findings reveal that the analysed LLMs adopt distinct communication styles and pronoun usage, even when prompted identically. All the models tend to have a strong positivity bias, possibly downplaying seriousness and importance of complex and sensitive topics. This work provides both a new dataset and a novel analytical approach, highlighting the need for transparent, network-based methods to monitor and mitigate LLM biases as these models become central tools for retrieving information.
Cognitive networks identify AI biases on societal issues in Large Language Models / De Duro, Edoardo Sebastiano; Franchino, Emma; Improta, Riccardo; Veltri, Giuseppe Alessandro; Stella, Massimo. - In: EPJ DATA SCIENCE. - ISSN 2193-1127. - 2025:(In corso di stampa). [10.1140/epjds/s13688-025-00600-7]
Cognitive networks identify AI biases on societal issues in Large Language Models
De Duro, Edoardo Sebastiano;Franchino, Emma;Improta, Riccardo;Veltri, Giuseppe Alessandro
;Stella, Massimo
In corso di stampa
Abstract
Millions of people use Large Language Models (LLMs) to research information about complex topics related to societal issues. As a result, LLMs might be influencing large worldwide audiences in ways that remain unexplored with empirical data. To address this data gap, this study introduces and analyses SociaLLMisinformation: a dataset of 33,000 English and Italian LLM-generated texts on societal issues like climate change, global warming and health misinformation. Texts were mined from OpenAI's GPT 3.5 and GPT 4o, Meta's Llama 3 and Llama 3.1, Anthropic's Claude 3's Haiku, Mistral and LLaMAntino. We investigate LLMs' framings in regard to these societal topics, through an interpretable computational framework based on textual forma mentis networks (TFMNs), i.e., networks of syntactic/semantic associations between concepts in texts. Using TFMNs, we extract LLMs' linguistic and affective biases present in the SociaLLMisinformation texts. Our findings reveal that the analysed LLMs adopt distinct communication styles and pronoun usage, even when prompted identically. All the models tend to have a strong positivity bias, possibly downplaying seriousness and importance of complex and sensitive topics. This work provides both a new dataset and a novel analytical approach, highlighting the need for transparent, network-based methods to monitor and mitigate LLM biases as these models become central tools for retrieving information.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



