Purpose Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, over-generalized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results Human raters showed moderate agreement (kappa(mean) = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (kappa= 0.58), while Gemini reached only fair agreement (kappa= 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.
Purpose: Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods: In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, overgeneralized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results: Human raters showed moderate agreement (κmean = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (κ = 0.58), while Gemini reached only fair agreement (κ = 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion: These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.
Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models / Carollo, Alessandro; Fong, Seraphina; Belardinelli, Giovanni; Perzolli, Silvia; Vivanti, Giacomo; Messinger, Daniel S.; Dimitriou, Dagmara; Esposito, Gianluca. - In: JOURNAL OF AUTISM AND DEVELOPMENTAL DISORDERS. - ISSN 1573-3432. - 2026:(2026). [10.1007/s10803-026-07249-9]
Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models
Carollo, AlessandroCo-primo
;Fong, SeraphinaCo-primo
;Perzolli, Silvia;Esposito, Gianluca
2026-01-01
Abstract
Purpose Social networking sites are major channels for sharing information on neurodiversity, including autism spectrum disorder. TikTok has become a particularly influential platform for autism-related communication, yet concerns remain about the scientific accuracy of such content. Most prior studies have focused on English-language videos and have evaluated accuracy with limited granularity. Additionally, the difficulty of achieving consistent expert ratings underscores the need for automated reliability assessment. Methods In this study, we examined 408 informational statements extracted from 148 TikTok videos posted under the hashtag #Autismo (Italian for #Autism). Three clinical experts independently classified each statement as inaccurate, over-generalized, or accurate; their median ratings served as the human-derived ground truth and were compared with classifications from two large language models: ChatGPT 4.0 mini and Gemini 1.5 Flash. Results Human raters showed moderate agreement (kappa(mean) = 0.52) and high specific agreement only for accurate statements, with lower agreement for overgeneralized and inaccurate content. ChatGPT achieved moderate agreement with human ratings (kappa= 0.58), while Gemini reached only fair agreement (kappa= 0.29). ChatGPT also exhibited a more conservative evaluation pattern (accurate information: precision = 0.89, recall = 0.82), whereas Gemini tended to overestimate accuracy (accurate information: precision = 0.76, recall = 0.93). Conclusion These findings suggest that LLMs, particularly ChatGPT, may support cautious and assistive evaluation of online health content. Future research should assess their applicability across online communities and platforms and explore their integration into accuracy-based alert systems that provide users with contextual reliability cues.| File | Dimensione | Formato | |
|---|---|---|---|
|
Carollo et al. (2026).pdf
accesso aperto
Descrizione: Accuracy of Autism-Related TikTok Information in Italian: A Comparison Between Human Raters and Large Language Models
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Creative commons
Dimensione
870.41 kB
Formato
Adobe PDF
|
870.41 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



