Endowing dialogue agents with profiles has proven effective in improving generation quality, yielding responses that are more consistent and contextually appropriate. However, research on persona-based dialogue systems has predominantly focused on modelling the agent's own profile, often overlooking the influential role of the interlocutor. This thesis reframes persona-driven dialogue generation from a dyadic perspective, demonstrating that the interlocutor's profile plays a crucial role in both the dialogue generation process as much as the agent's profile. First, we introduce PRODIGy, a large-scale profile-based dialogue dataset that aligns each conversation with structured representations of both speakers, including biography, personality, gender, and linguistic style. A central question we first address is which profile dimension most effectively supports persona expression. Through systematic benchmarking with both fine-tuned and instruction-based models, we compare personality, gender, communication style, and biography, finding that biography emerges as the most relevant. Models trained on PRODIGy's detailed biographical descriptions generalise better than those trained on shallow persona representations, and human evaluation confirms that profile-consistent responses are preferred, particularly in short or ambiguous contexts where profile cues become most informative. Second, we develop an evaluation paradigm framing speaker recognition as an author identification task: given a dialogue and a pool of candidate biographies, evaluators (human or LLM-based) are tasked with identifying which persona generated the target speaker's turns. By varying the disclosure of the interlocutor at evaluation time, we show that providing judges with the interlocutor's information substantially enhances target speaker identifiability. Our experiments reveal that models generalise robustly across unfamiliar topics yet struggle with unfamiliar interlocutors, establishing that who we are speaking with matters more than what we are speaking about. We further demonstrate that fine-tuning improves the models' ability to capture deeper persona characteristics, while zero-shot models rely on trivial copying mechanisms. Third, we extend this analysis across the full fine-tuning, inference, and evaluation pipeline. We demonstrate that restricting loss computation to final target speaker turns, while excluding dialogue history and persona descriptions, enables strong speaker recognisability with minimal surface copying. This approach results in copying patterns more similar to those found in human-written dialogues, suggesting that models can learn persona characteristics rather than verbatim reproduction. Training conditions prove to be critical: models need interlocutor visibility during fine-tuning to learn how to differentiate speakers appropriately. Without it, they fall back on copying directly from persona descriptions. In the asymmetric setting where only the interlocutor has access to the target’s profile and not vice versa, target persona information leaks into the interlocutor’s utterances, facilitating indirect identification by judges. This thesis contributes a reusable dataset, an innovative evaluation framework, and the first comprehensive analysis of both target and interlocutor personas' effects across the persona-based dialogue generation pipeline. Our findings establish that effective persona adaptation depends not only on the target persona, but crucially on the interlocutor's persona.
Not Just Who You Are, But Who You Talk To: The Role of Speaker Personas in Dialogue Generation / Occhipinti, Daniela. - (2026 Apr 28).
Not Just Who You Are, But Who You Talk To: The Role of Speaker Personas in Dialogue Generation
Occhipinti, Daniela
2026-04-28
Abstract
Endowing dialogue agents with profiles has proven effective in improving generation quality, yielding responses that are more consistent and contextually appropriate. However, research on persona-based dialogue systems has predominantly focused on modelling the agent's own profile, often overlooking the influential role of the interlocutor. This thesis reframes persona-driven dialogue generation from a dyadic perspective, demonstrating that the interlocutor's profile plays a crucial role in both the dialogue generation process as much as the agent's profile. First, we introduce PRODIGy, a large-scale profile-based dialogue dataset that aligns each conversation with structured representations of both speakers, including biography, personality, gender, and linguistic style. A central question we first address is which profile dimension most effectively supports persona expression. Through systematic benchmarking with both fine-tuned and instruction-based models, we compare personality, gender, communication style, and biography, finding that biography emerges as the most relevant. Models trained on PRODIGy's detailed biographical descriptions generalise better than those trained on shallow persona representations, and human evaluation confirms that profile-consistent responses are preferred, particularly in short or ambiguous contexts where profile cues become most informative. Second, we develop an evaluation paradigm framing speaker recognition as an author identification task: given a dialogue and a pool of candidate biographies, evaluators (human or LLM-based) are tasked with identifying which persona generated the target speaker's turns. By varying the disclosure of the interlocutor at evaluation time, we show that providing judges with the interlocutor's information substantially enhances target speaker identifiability. Our experiments reveal that models generalise robustly across unfamiliar topics yet struggle with unfamiliar interlocutors, establishing that who we are speaking with matters more than what we are speaking about. We further demonstrate that fine-tuning improves the models' ability to capture deeper persona characteristics, while zero-shot models rely on trivial copying mechanisms. Third, we extend this analysis across the full fine-tuning, inference, and evaluation pipeline. We demonstrate that restricting loss computation to final target speaker turns, while excluding dialogue history and persona descriptions, enables strong speaker recognisability with minimal surface copying. This approach results in copying patterns more similar to those found in human-written dialogues, suggesting that models can learn persona characteristics rather than verbatim reproduction. Training conditions prove to be critical: models need interlocutor visibility during fine-tuning to learn how to differentiate speakers appropriately. Without it, they fall back on copying directly from persona descriptions. In the asymmetric setting where only the interlocutor has access to the target’s profile and not vice versa, target persona information leaks into the interlocutor’s utterances, facilitating indirect identification by judges. This thesis contributes a reusable dataset, an innovative evaluation framework, and the first comprehensive analysis of both target and interlocutor personas' effects across the persona-based dialogue generation pipeline. Our findings establish that effective persona adaptation depends not only on the target persona, but crucially on the interlocutor's persona.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione



