Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.

Skim-Attention: Learning to Focus via Document Layout / Nguyen, Laura; Scialom, Thomas; Staiano, Jacopo; Piwowarski, Benjamin. - (2021), pp. 2413-2427. (Intervento presentato al convegno EMNLP tenutosi a Punta Cana, Dominican Republic nel 7th-11th November 2021) [10.18653/v1/2021.findings-emnlp.207].

Skim-Attention: Learning to Focus via Document Layout

Staiano, Jacopo
Penultimo
;
2021-01-01

Abstract

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.
2021
Findings of the Association for Computational Linguistics: EMNLP 2021
Stroudsburg, PA, USA
Association for Computational Linguistics
978-1-955917-10-0
Nguyen, Laura; Scialom, Thomas; Staiano, Jacopo; Piwowarski, Benjamin
Skim-Attention: Learning to Focus via Document Layout / Nguyen, Laura; Scialom, Thomas; Staiano, Jacopo; Piwowarski, Benjamin. - (2021), pp. 2413-2427. (Intervento presentato al convegno EMNLP tenutosi a Punta Cana, Dominican Republic nel 7th-11th November 2021) [10.18653/v1/2021.findings-emnlp.207].
File in questo prodotto:
File Dimensione Formato  
2021.findings-emnlp.207.pdf

Solo gestori archivio

Descrizione: Skim-Attention: Learning to Focus via Document Layout
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.01 MB
Formato Adobe PDF
2.01 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/392049
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 0
  • OpenAlex ND
social impact