Understanding Visual Information: from Unsupervised Discovery to Minimal Effort Domain Adaptation

IRIS

Visual data interpretation is a fascinating problem which has received an increasing attention in the last decades. Reasons for this growing trend can be found within multiple interconnected factors, such as the exponential growth of visual data (e.g. images and videos) availability, the consequent demand for an automatic way to interpret these data and the increase of computational power. In a supervised machine learning approach, a large effort within the research community has been devoted to the collection of training samples to be provided to the learning system, resulting in the generation of very large scale datasets. This has lead to remarkable performance advances in tasks such as scene recognition or object detection, however, at a considerable high cost in terms of human labeling effort. In light of the labeling cost issue, together with the dataset bias one, another significant research direction was headed towards developing methods for learning without or with a limited amount of training data, by leveraging instead on data properties like intrinsic redundancy, time constancy or commonalities shared among different domains. Our work is in line with this last type of approach. In particular, by covering different case scenarios - from dynamic crowded scenes to facial expression analysis - we propose a novel approach to overcome some of the state-of-the-art limitations. Based on the renowned bag of words (BoW) approach, we propose a novel method which achieves higher performances in tasks such as learning typical patterns of behaviors and anomalies discovery from complex scenes, by considering the similarity among visual words in the learning phase. We also show that including sparsity constraints can help dealing with noise which is intrinsic to low level cues extracted from complex dynamic scenes. Facing the so called dataset bias issue, we propose a novel method for adapting a classifier to a new unseen target user without the need of acquiring additional labeled samples. We prove the effectiveness of this method in the context of facial expression analysis showing that our method achieves higher or comparable performance to the state of the art, at a drastically reduced time cost.

Understanding Visual Information: from Unsupervised Discovery to Minimal Effort Domain Adaptation / Zen, Gloria. - (2015), pp. 1-121.

Understanding Visual Information: from Unsupervised Discovery to Minimal Effort Domain Adaptation

Zen, Gloria

2015-01-01

Abstract

Visual data interpretation is a fascinating problem which has received an increasing attention in the last decades. Reasons for this growing trend can be found within multiple interconnected factors, such as the exponential growth of visual data (e.g. images and videos) availability, the consequent demand for an automatic way to interpret these data and the increase of computational power. In a supervised machine learning approach, a large effort within the research community has been devoted to the collection of training samples to be provided to the learning system, resulting in the generation of very large scale datasets. This has lead to remarkable performance advances in tasks such as scene recognition or object detection, however, at a considerable high cost in terms of human labeling effort. In light of the labeling cost issue, together with the dataset bias one, another significant research direction was headed towards developing methods for learning without or with a limited amount of training data, by leveraging instead on data properties like intrinsic redundancy, time constancy or commonalities shared among different domains. Our work is in line with this last type of approach. In particular, by covering different case scenarios - from dynamic crowded scenes to facial expression analysis - we propose a novel approach to overcome some of the state-of-the-art limitations. Based on the renowned bag of words (BoW) approach, we propose a novel method which achieves higher performances in tasks such as learning typical patterns of behaviors and anomalies discovery from complex scenes, by considering the similarity among visual words in the learning phase. We also show that including sparsity constraints can help dealing with noise which is intrinsic to low level cues extracted from complex dynamic scenes. Facing the so called dataset bias issue, we propose a novel method for adapting a classifier to a new unseen target user without the need of acquiring additional labeled samples. We prove the effectiveness of this method in the context of facial expression analysis showing that our method achieves higher or comparable performance to the state of the art, at a drastically reduced time cost.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2015
			
	Ciclo
	
				XXVI
			
	Anno Accademico
	
				2014-2015
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Information and Communication Technology
			
	Supervisore/Relatore di tesi esterno (External supervisor)
	
				Sebe, Nicu
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
GZen_final_thesis.pdf accesso aperto Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 25.67 MB Formato Adobe PDF Visualizza/Apri	25.67 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368625

Citazioni

ND

ND

ND

ND

social impact