Neural text simplification has gained increasing attention in the NLP community thanksto recent advancements in deep sequence-to-sequence learning. Most recent efforts withsuch a data-demanding paradigm have dealtwith the English language, for which sizeabletraining datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work tocreate training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspiredby the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements toneural models, in this paper we exploit largeamounts of heterogeneous data to automatically select simple sentences, which are thenused to create synthetic simplification pairs.We also evaluate other solutions, such as over-sampling and the use of external word embeddings to be fed to the neural simplificationsystem. Our approach is evaluated on Italianand Spanish, for which few thousand gold sentence pairs are available. The results show thatthese techniques yield performance improvements over a baseline sequence-to-sequenceconfiguration.

Neural Text Simplification in Low-Resource Conditions Using Weak Supervision / Palmero Aprosio, Alessio; Tonelli, Sara; Turchi, Marco; Negri, Matteo; Di Gangi Mattia, A.. - ELETTRONICO. - (2019), pp. 37-44. (Intervento presentato al convegno Workshop on Methods for Optimizing and Evaluating Neural Language Generation (NeuralGen) tenutosi a Minneapolis, Minnesota, USA nel June 6, 2019) [10.18653/v1/W19-2305].

Neural Text Simplification in Low-Resource Conditions Using Weak Supervision

Palmero Aprosio Alessio;Tonelli Sara;Turchi Marco;Di Gangi Mattia A.
2019-01-01

Abstract

Neural text simplification has gained increasing attention in the NLP community thanksto recent advancements in deep sequence-to-sequence learning. Most recent efforts withsuch a data-demanding paradigm have dealtwith the English language, for which sizeabletraining datasets are currently available to deploy competitive models. Similar improvements on less resource-rich languages are conditioned either to intensive manual work tocreate training data, or to the design of effective automatic generation techniques to bypass the data acquisition bottleneck. Inspiredby the machine translation field, in which synthetic parallel pairs generated from monolingual data yield significant improvements toneural models, in this paper we exploit largeamounts of heterogeneous data to automatically select simple sentences, which are thenused to create synthetic simplification pairs.We also evaluate other solutions, such as over-sampling and the use of external word embeddings to be fed to the neural simplificationsystem. Our approach is evaluated on Italianand Spanish, for which few thousand gold sentence pairs are available. The results show thatthese techniques yield performance improvements over a baseline sequence-to-sequenceconfiguration.
2019
Proceedings of the Workshop on Methods for Optimizing and Evaluating Neural Language Generation (NeuralGen)
Minneapolis, Minnesota, USA
Association for Computational Linguistics (ACL)
Palmero Aprosio, Alessio; Tonelli, Sara; Turchi, Marco; Negri, Matteo; Di Gangi Mattia, A.
Neural Text Simplification in Low-Resource Conditions Using Weak Supervision / Palmero Aprosio, Alessio; Tonelli, Sara; Turchi, Marco; Negri, Matteo; Di Gangi Mattia, A.. - ELETTRONICO. - (2019), pp. 37-44. (Intervento presentato al convegno Workshop on Methods for Optimizing and Evaluating Neural Language Generation (NeuralGen) tenutosi a Minneapolis, Minnesota, USA nel June 6, 2019) [10.18653/v1/W19-2305].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/454138
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact