The rapid advancement of large language models (LLMs) creates new research opportunities in stance classification. However, existing studies often lack a systematic evaluation and empirical analysis of the performance of mainstream large models. In this paper, we systematically evaluate the performance of 5 SOTA large language models, including LLaMA, DeepSeek, Qwen, GPT, and Gemini, on stance classification using 13 benchmark datasets. We explore the effectiveness of two strategies — random selection and semantic similarity selection — within the framework of in-context learning. By comparing these approaches through cross-domain and in-domain experiments, we reveal how they impact model performance and provide insights for future optimization. Overall, this study clarifies the influence of different models and sampling strategies on stance classification performance and suggests directions for further research. Our code is available at: https://github.com/shilida/In-context4Stance.
An empirical study of LLMs via in-context learning for stance classification / Shi, L.; Giunchiglia, F.; Luo, R.; Shi, D.; Song, R.; Diao, X.; Xu, H.. - In: INFORMATION PROCESSING & MANAGEMENT. - ISSN 0306-4573. - 63:1(2026). [10.1016/j.ipm.2025.104322]
An empirical study of LLMs via in-context learning for stance classification
Shi L.;Giunchiglia F.;Shi D.;Diao X.;
2026-01-01
Abstract
The rapid advancement of large language models (LLMs) creates new research opportunities in stance classification. However, existing studies often lack a systematic evaluation and empirical analysis of the performance of mainstream large models. In this paper, we systematically evaluate the performance of 5 SOTA large language models, including LLaMA, DeepSeek, Qwen, GPT, and Gemini, on stance classification using 13 benchmark datasets. We explore the effectiveness of two strategies — random selection and semantic similarity selection — within the framework of in-context learning. By comparing these approaches through cross-domain and in-domain experiments, we reveal how they impact model performance and provide insights for future optimization. Overall, this study clarifies the influence of different models and sampling strategies on stance classification performance and suggests directions for further research. Our code is available at: https://github.com/shilida/In-context4Stance.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



