Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain

Sayin, Burcu; Staiano, Jacopo; Minervini, Pasquale; Passerini, Andrea

doi:10.18653/v1/2024.clinicalnlp-1.19

We explore the potential of Large Language Models (LLMs) to assist and potentially correct physicians in medical decision-making tasks. We evaluate several LLMs, including Meditron, Llama2, and Mistral, to analyze the ability of these models to interact effectively with physicians across different scenarios. We consider questions from PubMedQA and several tasks, ranging from binary (yes/no) responses to long answer generation, where the answer of the model is produced after an interaction with a physician. Our findings suggest that prompt design significantly influences the downstream accuracy of LLMs and that LLMs can provide valuable feedback to physicians, challenging incorrect diagnoses and contributing to more accurate decision-making. For example, when the physician is accurate 38% of the time, Mistral can produce the correct answer, improving accuracy up to 74% depending on the prompt being used, while Llama2 and Meditron models exhibit greater sensitivity to prompt choice. Our analysis also uncovers the challenges of ensuring that LLM-generated suggestions are pertinent and useful, emphasizing the need for further research in this area.

Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain / Sayin, Burcu; Staiano, Jacopo; Minervini, Pasquale; Passerini, Andrea. - ELETTRONICO. - (2024), pp. 218-237. (Intervento presentato al convegno ClinicalNLP 2024 tenutosi a Mexico City, Mexico nel June 2024) [10.18653/v1/2024.clinicalnlp-1.19].

Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain

Sayin, Burcu;Staiano, Jacopo;Minervini, Pasquale;Passerini, Andrea

2024-01-01

Abstract

We explore the potential of Large Language Models (LLMs) to assist and potentially correct physicians in medical decision-making tasks. We evaluate several LLMs, including Meditron, Llama2, and Mistral, to analyze the ability of these models to interact effectively with physicians across different scenarios. We consider questions from PubMedQA and several tasks, ranging from binary (yes/no) responses to long answer generation, where the answer of the model is produced after an interaction with a physician. Our findings suggest that prompt design significantly influences the downstream accuracy of LLMs and that LLMs can provide valuable feedback to physicians, challenging incorrect diagnoses and contributing to more accurate decision-making. For example, when the physician is accurate 38% of the time, Mistral can produce the correct answer, improving accuracy up to 74% depending on the prompt being used, while Llama2 and Meditron models exhibit greater sensitivity to prompt choice. Our analysis also uncovers the challenges of ensuring that LLM-generated suggestions are pertinent and useful, emphasizing the need for further research in this area.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2024
			
	Titolo del volume (Proceedings title)
	
				Proceedings of the 6th Clinical Natural Language Processing Workshop
			
	Luogo di edizione (Place of publication)
	
				Mexico City, Mexico
			
	Casa editrice (Publisher)
	
				Association for Computational Linguistics
			
	Tutti gli autori
	
						Sayin, Burcu; Staiano, Jacopo; Minervini, Pasquale; Passerini, Andrea
					
	Citazione
	
				Can LLMs Correct Physicians, Yet? Investigating Effective Interaction Methods in the Medical Domain / Sayin, Burcu; Staiano, Jacopo; Minervini, Pasquale; Passerini, Andrea. - ELETTRONICO. - (2024), pp. 218-237. (Intervento presentato al  convegno ClinicalNLP 2024 tenutosi a Mexico City, Mexico nel June 2024) [10.18653/v1/2024.clinicalnlp-1.19].
			
	Appare nelle tipologie:
	
				04.1 Saggio in atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
2024.clinicalnlp-1.19.pdf accesso aperto Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 945.18 kB Formato Adobe PDF Visualizza/Apri	945.18 kB	Adobe PDF	Visualizza/Apri