During the last decade, machine learning techniques have been used successfully in many applications. The performance of these systems depends largely on the quality and quantity of the training data. For many tasks, the data itself is not rich enough. For example, text documents such as user-queries, users-comments and short advertisements consist of only few words. Therefore direct word-based representations are sparse which makes it difficult to measure good similarities for clustering or classification. In many other applications, training data is too expensive to fully obtain. In the task of human action recognition from still images, the total number of possible actions is the cartesian product of objects and verbs. This combinatorial explosion of verb-object relations makes the task of learning human actions directly from their visual appearance computationally prohibitive and makes the collection of proper-sized image datasets infeasible. This thesis proposes a framework to enrich poor data with knowledge automatically extracted from large-scale text corpora. It considers various text modeling techniques to extract knowledge. The data enrichment framework is illustrated in different tasks in both language and vision applications. For language applications, we apply data enrichment to query classification. A topic model is estimated on external text corpora as a reference set. This model is then used to analyze topics for short queries and categories, generating shared context between them. The experimental results show that the data enrichment process increases the performance of the system, helping to find better categories for a given query. For vision applications, we employ the knowledge extracted from large scale text corpora to predict objects in context and recognize human actions in images. We investigate the problem of modeling text corpora for knowledge extraction and discuss which model is the most suitable for each particular task. In the first task, we learn the relations between objects from text corpora to predict how different objects often occur together using a probability model. This knowledge is then used to help predict new objects given other objects in the images. In the human action recognition task, we combine the knowledge extracted from external text corpora with the visual features from the images. Based on the visually recognized objects, scenes and relative positions between the human and objects in these images, the most plausible actions are suggested using the knowledge learned from the general external text. This model allows recognizing unseen actions and even outperforms a visual Bag-of-Words model in a realistic scenario where only few visual training examples are available.

Exploiting Text Corpora for Data Enrichment in Language and Vision Applications / Le, Dieu Thu. - (2014), pp. 1-99.

Exploiting Text Corpora for Data Enrichment in Language and Vision Applications

Le, Dieu Thu
2014-01-01

Abstract

During the last decade, machine learning techniques have been used successfully in many applications. The performance of these systems depends largely on the quality and quantity of the training data. For many tasks, the data itself is not rich enough. For example, text documents such as user-queries, users-comments and short advertisements consist of only few words. Therefore direct word-based representations are sparse which makes it difficult to measure good similarities for clustering or classification. In many other applications, training data is too expensive to fully obtain. In the task of human action recognition from still images, the total number of possible actions is the cartesian product of objects and verbs. This combinatorial explosion of verb-object relations makes the task of learning human actions directly from their visual appearance computationally prohibitive and makes the collection of proper-sized image datasets infeasible. This thesis proposes a framework to enrich poor data with knowledge automatically extracted from large-scale text corpora. It considers various text modeling techniques to extract knowledge. The data enrichment framework is illustrated in different tasks in both language and vision applications. For language applications, we apply data enrichment to query classification. A topic model is estimated on external text corpora as a reference set. This model is then used to analyze topics for short queries and categories, generating shared context between them. The experimental results show that the data enrichment process increases the performance of the system, helping to find better categories for a given query. For vision applications, we employ the knowledge extracted from large scale text corpora to predict objects in context and recognize human actions in images. We investigate the problem of modeling text corpora for knowledge extraction and discuss which model is the most suitable for each particular task. In the first task, we learn the relations between objects from text corpora to predict how different objects often occur together using a probability model. This knowledge is then used to help predict new objects given other objects in the images. In the human action recognition task, we combine the knowledge extracted from external text corpora with the visual features from the images. Based on the visually recognized objects, scenes and relative positions between the human and objects in these images, the most plausible actions are suggested using the knowledge learned from the general external text. This model allows recognizing unseen actions and even outperforms a visual Bag-of-Words model in a realistic scenario where only few visual training examples are available.
2014
XXVI
2013-2014
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Bernardi, Raffaella
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
thesisdieuthule.pdf

accesso aperto

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 17.1 MB
Formato Adobe PDF
17.1 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368549
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact