Purpose Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, over-generalized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results Human raters showed moderate agreement (kappa(mean) = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (kappa= 0.58), while Gemini reached only fair agreement (kappa= 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.

Purpose: Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods: In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, overgeneralized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results: Human raters showed moderate agreement (κmean = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (κ = 0.58), while Gemini reached only fair agreement (κ = 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion: These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.

Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models / Carollo, Alessandro; Fong, Seraphina; Belardinelli, Giovanni; Perzolli, Silvia; Vivanti, Giacomo; Messinger, Daniel S.; Dimitriou, Dagmara; Esposito, Gianluca. - In: JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS. - ISSN 1573-3432. - 2026:(2026). [10.1007/s10803-026-07249-9]

Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models

Carollo, Alessandro
Co-primo
;
Fong, Seraphina
Co-primo
;
Perzolli, Silvia;Esposito, Gianluca
2026-01-01

Abstract

Purpose Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, over-generalized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results Human raters showed moderate agreement (kappa(mean) = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (kappa= 0.58), while Gemini reached only fair agreement (kappa= 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.
2026
Carollo, Alessandro; Fong, Seraphina; Belardinelli, Giovanni; Perzolli, Silvia; Vivanti, Giacomo; Messinger, Daniel S.; Dimitriou, Dagmara; Esposito, ...espandi
Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models / Carollo, Alessandro; Fong, Seraphina; Belardinelli, Giovanni; Perzolli, Silvia; Vivanti, Giacomo; Messinger, Daniel S.; Dimitriou, Dagmara; Esposito, Gianluca. - In: JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS. - ISSN 1573-3432. - 2026:(2026). [10.1007/s10803-026-07249-9]
File in questo prodotto:
File Dimensione Formato  
Carollo et al. (2026).pdf

accesso aperto

Descrizione: Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Creative commons
Dimensione 870.41 kB
Formato Adobe PDF
870.41 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/479231
Citazioni
  • ???jsp.display-item.citation.pmc??? 1
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex 0
social impact