In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the preferred choice to solve the textual clustering problem.

Textual Article Clustering in Newspaper Pages / Aiello, Marco; Pegoretti, Andrea. - ELETTRONICO. - (2004), pp. 1-33.

Textual Article Clustering in Newspaper Pages

Aiello, Marco;
2004-01-01

Abstract

In the analysis of a newspaper page an important step is the clustering of various text blocks into logical units, i.e., into articles. We propose three algorithms based on text processing techniques to cluster articles in newspaper pages. Based on the complexity of the three algorithms and experimentation on actual pages from the Italian newspaper L’Adige, we select one of the algorithms as the preferred choice to solve the textual clustering problem.
2004
Trento
Università degli Studi di Trento - Dipartimento di Informatica e Telecomunicazioni
Textual Article Clustering in Newspaper Pages / Aiello, Marco; Pegoretti, Andrea. - ELETTRONICO. - (2004), pp. 1-33.
Aiello, Marco; Pegoretti, Andrea
File in questo prodotto:
File Dimensione Formato  
aiello-pegoretti.pdf

accesso aperto

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/359210
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact