Lung ultrasound (LUS) interpretation is often subjective and operator-dependent, motivating the development of automated, artificial intelligence (AI)-based methods. This international, multi-center study evaluated two distinct deep learning approaches for automated LUS severity scoring for pulmonary infections caused by COVID-19: a pre-trained classification model (CM) and a segmentation model based method (SM); assessing performance at video, exam, and prognostic levels. Two datasets were analyzed: one comprising data from multiple scanners and another using data from a single scanner. Results showed that the SM achieved prognostic-level agreement with expert clinicians comparable to that of the CM. Furthermore, at the exam level, over 84% of examinations were classified with acceptable error (≤ 10 score difference) across both models and datasets, reaching both methods an agreement higher than 95% on the dataset acquired by a single scanner. The obtained results demonstrate the pote...
Evaluating deep learning approaches for AI-assisted lung ultrasound diagnosis: an international multi-center and multi-scanner study / Muñoz, Mario; Han, Xi; Camacho, Jorge; Perrone, Tiziano; Smargiassi, Andrea; Inchingolo, Riccardo; Tung-Chen, Yale; Demi, Libertario. - In: THE ULTRASOUND JOURNAL. - ISSN 2524-8987. - 17:1(2025). [10.1186/s13089-025-00451-3]
Evaluating deep learning approaches for AI-assisted lung ultrasound diagnosis: an international multi-center and multi-scanner study
Han, Xi;Demi, Libertario
2025-01-01
Abstract
Lung ultrasound (LUS) interpretation is often subjective and operator-dependent, motivating the development of automated, artificial intelligence (AI)-based methods. This international, multi-center study evaluated two distinct deep learning approaches for automated LUS severity scoring for pulmonary infections caused by COVID-19: a pre-trained classification model (CM) and a segmentation model based method (SM); assessing performance at video, exam, and prognostic levels. Two datasets were analyzed: one comprising data from multiple scanners and another using data from a single scanner. Results showed that the SM achieved prognostic-level agreement with expert clinicians comparable to that of the CM. Furthermore, at the exam level, over 84% of examinations were classified with acceptable error (≤ 10 score difference) across both models and datasets, reaching both methods an agreement higher than 95% on the dataset acquired by a single scanner. The obtained results demonstrate the pote...I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



