Semantic Segmentation of Text Using Deep Learning

Lattisi, Tiziano; Farina, Davide; Ronchetti, Marco

doi:10.31577/cai_2022_1_78

Given a text, can we segment it into semantically coherent sections in an automatic way? Can we detect the semantic boundaries, if we know how many they are? Can we determine how many semantically distinct sections are in the text? These are the questions we address in this paper. To respond, we use the Bidirectional Encoder Representation from Transformer (BERT) to analyze the text and evaluate a function that we call local incoherence, which we expect to show maxima at the points where a semantic boundary is detected. Our results, although preliminary, are encouraging and suggest that our approach can be successfully applied. However, they are quite sensitive with respect to the text quality, as it happens in the case in which the text is derived from an audio stream via AutomaticSpeech Recognition techniques

Semantic Segmentation of Text Using Deep Learning / Lattisi, Tiziano; Farina, Davide; Ronchetti, Marco. - In: COMPUTING AND INFORMATICS. - ISSN 1335-9150. - ELETTRONICO. - 2022, 41:1(2022), pp. 78-97. [10.31577/cai_2022_1_78]

Semantic Segmentation of Text Using Deep Learning

Lattisi,Tiziano;Farina, Davide;Ronchetti, Marco

2022-01-01

Abstract

Given a text, can we segment it into semantically coherent sections in an automatic way? Can we detect the semantic boundaries, if we know how many they are? Can we determine how many semantically distinct sections are in the text? These are the questions we address in this paper. To respond, we use the Bidirectional Encoder Representation from Transformer (BERT) to analyze the text and evaluate a function that we call local incoherence, which we expect to show maxima at the points where a semantic boundary is detected. Our results, although preliminary, are encouraging and suggest that our approach can be successfully applied. However, they are quite sensitive with respect to the text quality, as it happens in the case in which the text is derived from an audio stream via AutomaticSpeech Recognition techniques

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2022
			
	Titolo del periodico (Journal title)
	
				COMPUTING AND INFORMATICS
			
	Numero e parte del fascicolo (Issue number and part)
	
				1
			
	DOI
	
				https://dx.doi.org/10.31577/cai_2022_1_78
			
	Codice Scopus (Scopus identifier)
	
				2-s2.0-85130624861
			
	Codice WOS (WOS identifier)
	
				WOS:000817923300005
			
	Tutti gli autori
	
						Lattisi, Tiziano; Farina, Davide; Ronchetti, Marco
					
	Citazione
	
				Semantic Segmentation of Text Using Deep Learning / Lattisi, Tiziano; Farina, Davide; Ronchetti, Marco. - In: COMPUTING AND INFORMATICS. - ISSN 1335-9150. - ELETTRONICO. - 2022, 41:1(2022), pp. 78-97. [10.31577/cai_2022_1_78]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
vieraj,+5877_adsi-8.pdf accesso aperto Descrizione: Articolo Tipologia: Versione editoriale (Publisher’s layout) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 891.4 kB Formato Adobe PDF Visualizza/Apri	891.4 kB	Adobe PDF	Visualizza/Apri