An important task in data integration and data cleaning is the identification of data that describe the same real world object, such as an event, a person, or a movie. There are various techniques to tackle this problem. The typical methodology is to collect matching evidence, such as similarities between the entity strings, and based on them generate information to link the entities. Then, using predefined thresholds, or human intervention, the entities are merged, and thus queries are executed over the resulted merged entities. In this chapter, we explain the limitations of this methodology on recently introduced data, for instance data from Web 2.0 applications, and the challenges that such data impose on the entity linkage methodology. We then propose an alternative, generic methodology that allows the use of the entity linkage information upon query processing. In particular, we define a generic data model suitable for representing the entity and linkage information as this is generated by a number of the existing entity linkage techniques. Entities are compiled on-the-fly, by effectively processing the incoming query over the representation model, and thus, query answers reflect the most probable entity solution for the specific query. We also report the results of our extensive experimental evaluation, which verify the efficiency and effectiveness of the suggested methodology.

Embracing Uncertainty in Entity Linking

Velegrakis, Ioannis
2012-01-01

Abstract

An important task in data integration and data cleaning is the identification of data that describe the same real world object, such as an event, a person, or a movie. There are various techniques to tackle this problem. The typical methodology is to collect matching evidence, such as similarities between the entity strings, and based on them generate information to link the entities. Then, using predefined thresholds, or human intervention, the entities are merged, and thus queries are executed over the resulted merged entities. In this chapter, we explain the limitations of this methodology on recently introduced data, for instance data from Web 2.0 applications, and the challenges that such data impose on the entity linkage methodology. We then propose an alternative, generic methodology that allows the use of the entity linkage information upon query processing. In particular, we define a generic data model suitable for representing the entity and linkage information as this is generated by a number of the existing entity linkage techniques. Entities are compiled on-the-fly, by effectively processing the incoming query over the representation model, and thus, query answers reflect the most probable entity solution for the specific query. We also report the results of our extensive experimental evaluation, which verify the efficiency and effectiveness of the suggested methodology.
2012
AA. VV.
"Semantic Search over the Web"
Berlin
Springer
9783642250071
9783642250088
E., Ioannou; W., Nejdl; C., Niederee; Velegrakis, Ioannis
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/92913
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? ND
social impact