We study the problem of finding related forum posts to a post at hand. In contrast to traditional approaches for finding related documents that perform content comparisons across the content of the posts as a whole, we consider each post as a set of segments, each written with a different goal in mind. We advocate that the relatedness between two posts should be based on the similarity of their respective segments that are intended for the same goal, i.e., are conveying the same intention. This means that it is possible for the same terms to weigh differently in the relatedness score depending on the intention of the segment in which they are found. We have developed a segmentation method that by monitoring a number of text features can identify the parts of a post where significant jumps occur indicating a point where a segmentation should take place. The generated segments of all the posts are clustered to form intention clusters and then similarities across the posts are calculated through similarities across segments with the same intention. We experimentally illustrate the effectiveness and efficiency of our segmentation method and our overall approach of finding related forum posts.

Finding Related Forum Posts through Content Similarity over Intention-Based Segmentation / Papadimitriou, Dimitra; Koutrika, Georgia; Velegrakis, Yannis; Mylopoulos, John. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - 29:9(2017), pp. 1860-1873. [10.1109/TKDE.2017.2699965]

Finding Related Forum Posts through Content Similarity over Intention-Based Segmentation

Papadimitriou, Dimitra;Velegrakis, Yannis;Mylopoulos, John
2017-01-01

Abstract

We study the problem of finding related forum posts to a post at hand. In contrast to traditional approaches for finding related documents that perform content comparisons across the content of the posts as a whole, we consider each post as a set of segments, each written with a different goal in mind. We advocate that the relatedness between two posts should be based on the similarity of their respective segments that are intended for the same goal, i.e., are conveying the same intention. This means that it is possible for the same terms to weigh differently in the relatedness score depending on the intention of the segment in which they are found. We have developed a segmentation method that by monitoring a number of text features can identify the parts of a post where significant jumps occur indicating a point where a segmentation should take place. The generated segments of all the posts are clustered to form intention clusters and then similarities across the posts are calculated through similarities across segments with the same intention. We experimentally illustrate the effectiveness and efficiency of our segmentation method and our overall approach of finding related forum posts.
2017
9
Papadimitriou, Dimitra; Koutrika, Georgia; Velegrakis, Yannis; Mylopoulos, John
Finding Related Forum Posts through Content Similarity over Intention-Based Segmentation / Papadimitriou, Dimitra; Koutrika, Georgia; Velegrakis, Yannis; Mylopoulos, John. - In: IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING. - ISSN 1041-4347. - 29:9(2017), pp. 1860-1873. [10.1109/TKDE.2017.2699965]
File in questo prodotto:
File Dimensione Formato  
PapadimitriouKVM17.pdf

accesso aperto

Tipologia: Post-print referato (Refereed author’s manuscript)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 825.35 kB
Formato Adobe PDF
825.35 kB Adobe PDF Visualizza/Apri
07915736.pdf

Solo gestori archivio

Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.48 MB
Formato Adobe PDF
1.48 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/195029
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 16
  • ???jsp.display-item.citation.isi??? 6
social impact