Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.

Skim-Attention: Learning to Focus via Document Layout / Nguyen, L.; Scialom, T.; Staiano, J.; Piwowarski, B.. - (2021), pp. 2413-2427. (Intervento presentato al convegno EMNLP tenutosi a Punta Cana, Dominican Republic nel 7th November - 11th November 2021).

Skim-Attention: Learning to Focus via Document Layout

Staiano J.
Penultimo
;
2021-01-01

Abstract

Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.
2021
Findings of the Association for Computational Linguistics: EMNLP 2021
New York, NY, USA
Association for Computational Linguistics
978-1-955917-10-0
Nguyen, L.; Scialom, T.; Staiano, J.; Piwowarski, B.
Skim-Attention: Learning to Focus via Document Layout / Nguyen, L.; Scialom, T.; Staiano, J.; Piwowarski, B.. - (2021), pp. 2413-2427. (Intervento presentato al convegno EMNLP tenutosi a Punta Cana, Dominican Republic nel 7th November - 11th November 2021).
File in questo prodotto:
File Dimensione Formato  
2021.findings-emnlp.207.pdf

accesso aperto

Descrizione: Skim-Attention: Learning to Focus via Document Layout
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.01 MB
Formato Adobe PDF
2.01 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/392049
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? ND
social impact