In example-based retrieval a system is queried with a docu- ment aiming to retrieve other similar or relevant documents. We address an instance of this problem: question retrieval in community Question Answering (cQA) forums. In this scenario, both the document collection and the queries are relatively short multi-sentence documents subject to noise and redundancy, which makes it harder for learning-to-rank algo- rithms to build upon the proper text representation. In order to only exploit the relevant fragments of the query and collec- tion documents, we treat them as a sequence of sentences, in a multiple- instance learning fashion. By automatically pre-selecting the best sen- tences for our tree-kernel-based learning model, we improve over using full text performance on the dataset of the 2016 SemEval cQA challenge in terms of accuracy and speed, reaching the state of the art.
A multiple-instance learning approach to sentence selection for question ranking / Romeo, Salvatore; Da San Martino, Giovanni; Barrón-Cedeño, Alberto; Moschitti, Alessandro. - ELETTRONICO. - 10193:(2017), pp. 437-449. (Intervento presentato al convegno ECIR 2017 tenutosi a Aberdeen nel April 8-13, 2017) [10.1007/978-3-319-56608-5_34].
A multiple-instance learning approach to sentence selection for question ranking
Alessandro Moschitti
2017-01-01
Abstract
In example-based retrieval a system is queried with a docu- ment aiming to retrieve other similar or relevant documents. We address an instance of this problem: question retrieval in community Question Answering (cQA) forums. In this scenario, both the document collection and the queries are relatively short multi-sentence documents subject to noise and redundancy, which makes it harder for learning-to-rank algo- rithms to build upon the proper text representation. In order to only exploit the relevant fragments of the query and collec- tion documents, we treat them as a sequence of sentences, in a multiple- instance learning fashion. By automatically pre-selecting the best sen- tences for our tree-kernel-based learning model, we improve over using full text performance on the dataset of the 2016 SemEval cQA challenge in terms of accuracy and speed, reaching the state of the art.File | Dimensione | Formato | |
---|---|---|---|
2017_ECIR.pdf
accesso aperto
Tipologia:
Post-print referato (Refereed author’s manuscript)
Licenza:
Tutti i diritti riservati (All rights reserved)
Dimensione
319.47 kB
Formato
Adobe PDF
|
319.47 kB | Adobe PDF | Visualizza/Apri |
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione