The wide diffusion of multimedia contents of different type and format led to the need of effective methods to efficiently handle such huge amount of information, opening interesting research challenges in the media community. In particular, the definition of suitable content understanding methodologies is attracting the effort of a large number of researchers worldwide, who proposed various tools for automatic content organization, retrieval, search, annotation and summarization. In this thesis, we will focus on an important concept, that is the inherent link between ''media" and the ''events" that such media are depicting. We will present two different methodologies related to such problem, and in particular to the automatic discovery of event-semantics from media contents. The two methodologies address this general problem at two different levels of abstraction. In the first approach we will be concerned with the detection of activities and behaviors of people from a video sequence (i.e., what a person is doing and how), while in the second we will face the more general problem of understanding a class of events from a set visual media (i.e., the situation and context). Both problems will be addressed trying to avoid making strong a-priori assumptions, i.e., considering the largely unstructured and variable nature of events.As to the first methodology, we will discuss about events related to the behavior of a person living in a home environment. The automatic understanding of human activity is still an open problems in the scientific community, although several solutions have been proposed so far, and may provide important breakthroughs in many application domains such as context-aware computing, area monitoring and surveillance, assistive technologies for the elderly or disabled, and more. An innovative approach is presented in this thesis, providing (i) a compact representation of human activities, and (ii) an effective tool to reliably measure the similarity between activity instances. In particular, the activity pattern is modeled with a signature obtained through a symbolic abstraction of its spatio-temporal trace, allowing the application of high-level reasoning through context-free grammars for activity classification. As far as the second methodology is concerned, we will address the problem of identifying an event from single image. If event discovery from media is already a complex problem, detection from a single still picture is still considered out-of-reach for current methodologies, as demonstrated by recent results of international benchmarks in the field. In this work we will focus on a solution that may open new perspectives in this area, by providing better knowledge on the link between visual perception and event semantics. In fact, what we propose is a framework that identifies image details that allow human beings identifying an event from single image that depicts it. These details are called ''event saliency", and are detected by exploiting the power of human computation through a gamification procedure. The resulting event saliency is a map of event-related image areas containing sufficient evidence of the underlying event, which could be used to learn the visual essence of the event itself, to enable improved automatic discovery techniques. Both methodologies will be demonstrated through extensive tests using publicly available datasets, as well as additional data created ad-hoc for the specific problems under analysis.

Multimedia Content Analysis for Event Detection / Rosani, Andrea. - (2015), pp. 1-90.

Multimedia Content Analysis for Event Detection

Rosani, Andrea
2015-01-01

Abstract

The wide diffusion of multimedia contents of different type and format led to the need of effective methods to efficiently handle such huge amount of information, opening interesting research challenges in the media community. In particular, the definition of suitable content understanding methodologies is attracting the effort of a large number of researchers worldwide, who proposed various tools for automatic content organization, retrieval, search, annotation and summarization. In this thesis, we will focus on an important concept, that is the inherent link between ''media" and the ''events" that such media are depicting. We will present two different methodologies related to such problem, and in particular to the automatic discovery of event-semantics from media contents. The two methodologies address this general problem at two different levels of abstraction. In the first approach we will be concerned with the detection of activities and behaviors of people from a video sequence (i.e., what a person is doing and how), while in the second we will face the more general problem of understanding a class of events from a set visual media (i.e., the situation and context). Both problems will be addressed trying to avoid making strong a-priori assumptions, i.e., considering the largely unstructured and variable nature of events.As to the first methodology, we will discuss about events related to the behavior of a person living in a home environment. The automatic understanding of human activity is still an open problems in the scientific community, although several solutions have been proposed so far, and may provide important breakthroughs in many application domains such as context-aware computing, area monitoring and surveillance, assistive technologies for the elderly or disabled, and more. An innovative approach is presented in this thesis, providing (i) a compact representation of human activities, and (ii) an effective tool to reliably measure the similarity between activity instances. In particular, the activity pattern is modeled with a signature obtained through a symbolic abstraction of its spatio-temporal trace, allowing the application of high-level reasoning through context-free grammars for activity classification. As far as the second methodology is concerned, we will address the problem of identifying an event from single image. If event discovery from media is already a complex problem, detection from a single still picture is still considered out-of-reach for current methodologies, as demonstrated by recent results of international benchmarks in the field. In this work we will focus on a solution that may open new perspectives in this area, by providing better knowledge on the link between visual perception and event semantics. In fact, what we propose is a framework that identifies image details that allow human beings identifying an event from single image that depicts it. These details are called ''event saliency", and are detected by exploiting the power of human computation through a gamification procedure. The resulting event saliency is a map of event-related image areas containing sufficient evidence of the underlying event, which could be used to learn the visual essence of the event itself, to enable improved automatic discovery techniques. Both methodologies will be demonstrated through extensive tests using publicly available datasets, as well as additional data created ad-hoc for the specific problems under analysis.
2015
XXVII
2013-2014
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
De Natale, Francesco
no
Inglese
Settore ING-INF/03 - Telecomunicazioni
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
Andrea-Rosani_PhD-Thesis.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 11.19 MB
Formato Adobe PDF
11.19 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368623
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact