Task-oriented bots (or simply bots) enable humans to perform tasks in natural language. For example, to book a restaurant or check the weather. Crowdsourcing has become a prominent approach to build datasets for training and evaluating task-oriented bots, where the crowd grows an initial seed of utterances through paraphrasing, i.e., reformulating a given seed into semantically equivalent sentences. In this context, the resulting diversity is a relevant dimension of high-quality datasets, as diverse paraphrases capture the many ways users may express an intent. Current techniques, however, are either based on the assumption that crowd-powered paraphrases are naturally diverse or focus only on lexical diversity. In this paper, we address an overlooked aspect of diversity and introduce an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse. We introduce a workflow and novel prompts that are informed by syntax patterns to elicit paraphrases avoiding or incorporating desired syntax. Our empirical analysis indicates that our approach yields higher syntactic diversity, syntactic novelty and more uniform pattern distribution than state-of-the-art baselines, albeit incurring on higher task effort.

Crowdsourcing Syntactically Diverse Paraphrases with Diversity-Aware Prompts and Workflows / Ramírez, J; Baez, M; Berro, A; Benatallah, B; Casati, F. - 13295:(2022), pp. 253-269. (Intervento presentato al convegno 34th International Conference on Advanced Information Systems Engineering, CAiSE 2022 tenutosi a Leuven, Belgium nel 6-10 June 2022) [10.1007/978-3-031-07472-1_15].

Crowdsourcing Syntactically Diverse Paraphrases with Diversity-Aware Prompts and Workflows

Baez, M;Benatallah, B;Casati, F
2022-01-01

Abstract

Task-oriented bots (or simply bots) enable humans to perform tasks in natural language. For example, to book a restaurant or check the weather. Crowdsourcing has become a prominent approach to build datasets for training and evaluating task-oriented bots, where the crowd grows an initial seed of utterances through paraphrasing, i.e., reformulating a given seed into semantically equivalent sentences. In this context, the resulting diversity is a relevant dimension of high-quality datasets, as diverse paraphrases capture the many ways users may express an intent. Current techniques, however, are either based on the assumption that crowd-powered paraphrases are naturally diverse or focus only on lexical diversity. In this paper, we address an overlooked aspect of diversity and introduce an approach for guiding the crowdsourcing process towards paraphrases that are syntactically diverse. We introduce a workflow and novel prompts that are informed by syntax patterns to elicit paraphrases avoiding or incorporating desired syntax. Our empirical analysis indicates that our approach yields higher syntactic diversity, syntactic novelty and more uniform pattern distribution than state-of-the-art baselines, albeit incurring on higher task effort.
2022
Advanced Information Systems Engineering. CAiSE 2022. Lecture Notes in Computer Science, vol 13295
GEWERBESTRASSE 11, CHAM, CH-6330, SWITZERLAND
SPRINGER INTERNATIONAL PUBLISHING AG
978-3-031-07471-4
978-3-031-07472-1
Ramírez, J; Baez, M; Berro, A; Benatallah, B; Casati, F
Crowdsourcing Syntactically Diverse Paraphrases with Diversity-Aware Prompts and Workflows / Ramírez, J; Baez, M; Berro, A; Benatallah, B; Casati, F. - 13295:(2022), pp. 253-269. (Intervento presentato al convegno 34th International Conference on Advanced Information Systems Engineering, CAiSE 2022 tenutosi a Leuven, Belgium nel 6-10 June 2022) [10.1007/978-3-031-07472-1_15].
File in questo prodotto:
File Dimensione Formato  
paper_124.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 735.38 kB
Formato Adobe PDF
735.38 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/397744
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact