Multi-Party Conversations (MPCs) are a common conversational scenario taking place in everyday life, from informal spoken discussions to large-scale debates on social media platforms. Modeling and processing such type of conversations provide important challenges for Natural Language Processing (NLP) systems, as they require to go way beyond the analysis of the single messages, by taking into account also the conversational context and the interactional structure. Despite recent advances in Large Language Models (LLMs), how such dimensions should be effectively modeled and evaluated is still an open research direction. In this thesis, we analyze the modeling, evaluation, and generation of written multi-party conversations, i.e., conversations where interactions happened through written messages, making all the information available in textual form, without the presence of a physical environment or possible visual cues, focusing on social media data. First, we study how conversational context can be incorporated to perform downstream classification tasks, by including textual, interactional, and temporal information. Through extensive experiments and analyses, we show that while contextual modeling can improve robustness and stability, it requires significantly higher amount of training data with respect to non-contextual solutions. Moreover, we demonstrate that macro-level evaluation metrics fail to show important performance variations across conversations with different structural complexity. Second, we analyze the ability of LLMs to model and predict conversational components, through proxy tasks like addressee recognition and response selection, in zero-shot settings. We also assess the ability of LLMs in generating useful summaries of conversations and useful descriptions of users' behavior. We propose a diagnostic evaluation framework that analyzes model sensitivity to prompt formulation and interactional structure, revealing systematic performance gaps that are not captured by standard benchmarks. Third, we address data scarcity and evaluation limitations by exploring the use of LLMs to generate large-scale synthetic MPC datasets under explicit structural constraints. Our results show that LLMs can produce structurally diverse and controllable conversations, especially by using a multi-step generation strategy. However, human evaluation highlights persistent challenges in assessing conversational quality and naturalness, especially when it comes to the interactional dimension, with respect to real world MPCs, as well as risks related to subjectivity and synthetic bias. To mitigate these limitations, we introduce a Human–AI collaborative platform that supports the creation and refinement of high-quality linear multi-party conversations from reply trees, by combining tree visualization, human supervision and LLM-assisted refinement. This approach improves controllability, transparency, and decision-making for creating conversational interactions, while offering a practical solution to scalable dataset creation. Overall, this thesis highlights the importance of interaction-aware modeling, fair evaluation protocols, and data diversity for advancing MPC research. The findings of this work open several directions for future research at the intersection of language modeling, conversational interaction analysis, and human–AI collaboration for the creation of synthetic MPC datasets.

What is the Multi-Party Trick? Language and Interactions in Multi-Party Conversations during the Era of LLMs / Penzo, Nicolò. - (2026 Apr 29).

What is the Multi-Party Trick? Language and Interactions in Multi-Party Conversations during the Era of LLMs

Penzo, Nicolò
2026-04-29

Abstract

Multi-Party Conversations (MPCs) are a common conversational scenario taking place in everyday life, from informal spoken discussions to large-scale debates on social media platforms. Modeling and processing such type of conversations provide important challenges for Natural Language Processing (NLP) systems, as they require to go way beyond the analysis of the single messages, by taking into account also the conversational context and the interactional structure. Despite recent advances in Large Language Models (LLMs), how such dimensions should be effectively modeled and evaluated is still an open research direction. In this thesis, we analyze the modeling, evaluation, and generation of written multi-party conversations, i.e., conversations where interactions happened through written messages, making all the information available in textual form, without the presence of a physical environment or possible visual cues, focusing on social media data. First, we study how conversational context can be incorporated to perform downstream classification tasks, by including textual, interactional, and temporal information. Through extensive experiments and analyses, we show that while contextual modeling can improve robustness and stability, it requires significantly higher amount of training data with respect to non-contextual solutions. Moreover, we demonstrate that macro-level evaluation metrics fail to show important performance variations across conversations with different structural complexity. Second, we analyze the ability of LLMs to model and predict conversational components, through proxy tasks like addressee recognition and response selection, in zero-shot settings. We also assess the ability of LLMs in generating useful summaries of conversations and useful descriptions of users' behavior. We propose a diagnostic evaluation framework that analyzes model sensitivity to prompt formulation and interactional structure, revealing systematic performance gaps that are not captured by standard benchmarks. Third, we address data scarcity and evaluation limitations by exploring the use of LLMs to generate large-scale synthetic MPC datasets under explicit structural constraints. Our results show that LLMs can produce structurally diverse and controllable conversations, especially by using a multi-step generation strategy. However, human evaluation highlights persistent challenges in assessing conversational quality and naturalness, especially when it comes to the interactional dimension, with respect to real world MPCs, as well as risks related to subjectivity and synthetic bias. To mitigate these limitations, we introduce a Human–AI collaborative platform that supports the creation and refinement of high-quality linear multi-party conversations from reply trees, by combining tree visualization, human supervision and LLM-assisted refinement. This approach improves controllability, transparency, and decision-making for creating conversational interactions, while offering a practical solution to scalable dataset creation. Overall, this thesis highlights the importance of interaction-aware modeling, fair evaluation protocols, and data diversity for advancing MPC research. The findings of this work open several directions for future research at the intersection of language modeling, conversational interaction analysis, and human–AI collaboration for the creation of synthetic MPC datasets.
29-apr-2026
XXXVIII
2024-2025
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Tonelli, Sara
no
Inglese
Settore ING-INF/05 - Sistemi di Elaborazione delle Informazioni
Settore IINF-05/A - Sistemi di elaborazione delle informazioni
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/484234
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact