Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.
Skim-Attention: Learning to Focus via Document Layout / Nguyen, Laura; Scialom, Thomas; Staiano, Jacopo; Piwowarski, Benjamin. - (2021), pp. 2413-2427. (Intervento presentato al convegno EMNLP tenutosi a Punta Cana, Dominican Republic nel 7th-11th November 2021) [10.18653/v1/2021.findings-emnlp.207].
Skim-Attention: Learning to Focus via Document Layout
Staiano, JacopoPenultimo
;
2021-01-01
Abstract
Transformer-based pre-training techniques of text and layout have proven effective in a number of document understanding tasks. Despite this success, multimodal pre-training models suffer from very high computational and memory costs. Motivated by human reading strategies, this paper presents Skim-Attention, a new attention mechanism that takes advantage of the structure of the document and its layout. Skim-Attention only attends to the 2- dimensional position of the words in a document. Our experiments show that SkimAttention obtains a lower perplexity than prior works, while being more computationally efficient. Skim-Attention can be further combined with long-range Transformers to efficiently process long documents. We also show how Skim-Attention can be used off-the-shelf as a mask for any Pre-trained Language Model, allowing to improve their performance while restricting attention. Finally, we show the emergence of a document structure representation in Skim-Attention.File | Dimensione | Formato | |
---|---|---|---|
2021.findings-emnlp.207.pdf
Solo gestori archivio
Descrizione: Skim-Attention: Learning to Focus via Document Layout
Tipologia:
Versione editoriale (Publisher’s layout)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
2.01 MB
Formato
Adobe PDF
|
2.01 MB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione