A vital step in integrating data from multiple sources is detecting and handling duplicate records that refer to the same real-life entity. Events are spatio-temporal entities that reflect changes in real world and are received or captured from different sources (sensors, mobile phones, social network services, etc.). In many real world situations, detecting events mostly take place through multiple observations by different observers. The local view of the observer reflects only a partial knowledge with certain granularity of time and space. Observations occur at a particular place and time, however events which are inferred from observations, range over time and space. In this thesis, we address the problem of event matching, which is the task of detecting similar events in the recent past from their observations. We focus on detecting Hyperlocal events, which are an integral part of any dynamic human decision-making process and are useful for different multi-tier responding agencies such as emergency medical services, public safety and law enforcement agencies, organizations working on fusing news from different sources as well as for citizens. In an environment where continuous monitoring and processing is required, the matching task imposes different challenges. In particular, the matching task is decomposed into four separate tasks in which each requiring different computational method. The four tasks are: event-type similarity, similarity in location, similarity in time and thematic role similarity that handles participants similarity. We refer to the four tasks as local similarities. Then in addition, a global similarity measure combines the four tasks before being able to cluster and handle them in a robust near real-time system. We address the local similarity by studying thoroughly existing similarity measures and propose suitable similarity for each task. We utilize ideas from semantic web, qualitative spatial reasoning, fuzzy set and structural alignment similarities in order to define local similarity measures. Then we address the global similarity by treating the problem as a relational learning problem and use machine learning to learn the weights of each local similarity. To learn the weights, we combine the features of each pair of events into one object and use logistic regression and support vector machines to learn the weights. The learned weighted function is tested and evaluated on real dataset which is used to predict the similarity class of the new streamed event.

Real-Time Event Centric Data Integration / Ayyad, Majed. - (2014), pp. 1-167.

Real-Time Event Centric Data Integration

Ayyad, Majed
2014-01-01

Abstract

A vital step in integrating data from multiple sources is detecting and handling duplicate records that refer to the same real-life entity. Events are spatio-temporal entities that reflect changes in real world and are received or captured from different sources (sensors, mobile phones, social network services, etc.). In many real world situations, detecting events mostly take place through multiple observations by different observers. The local view of the observer reflects only a partial knowledge with certain granularity of time and space. Observations occur at a particular place and time, however events which are inferred from observations, range over time and space. In this thesis, we address the problem of event matching, which is the task of detecting similar events in the recent past from their observations. We focus on detecting Hyperlocal events, which are an integral part of any dynamic human decision-making process and are useful for different multi-tier responding agencies such as emergency medical services, public safety and law enforcement agencies, organizations working on fusing news from different sources as well as for citizens. In an environment where continuous monitoring and processing is required, the matching task imposes different challenges. In particular, the matching task is decomposed into four separate tasks in which each requiring different computational method. The four tasks are: event-type similarity, similarity in location, similarity in time and thematic role similarity that handles participants similarity. We refer to the four tasks as local similarities. Then in addition, a global similarity measure combines the four tasks before being able to cluster and handle them in a robust near real-time system. We address the local similarity by studying thoroughly existing similarity measures and propose suitable similarity for each task. We utilize ideas from semantic web, qualitative spatial reasoning, fuzzy set and structural alignment similarities in order to define local similarity measures. Then we address the global similarity by treating the problem as a relational learning problem and use machine learning to learn the weights of each local similarity. To learn the weights, we combine the features of each pair of events into one object and use logistic regression and support vector machines to learn the weights. The learned weighted function is tested and evaluated on real dataset which is used to predict the similarity class of the new streamed event.
2014
XXVI
2013-2014
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Giunchiglia, Fausto
no
Inglese
File in questo prodotto:
File Dimensione Formato  
REAL-TIME_EVENT_CENTRIC_DATA_INTEGRATION.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 2.82 MB
Formato Adobe PDF
2.82 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/367750
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact