Network and Cascade Representation Learning: Algorithms based on Information Diffusion Events

Kefato, Zekarias Tilahun

doi:10.15168/11572_369013

Network representation learning (NRL) and cascade representation learn- ing (CRL) are fundamental backbones of different kinds of network analysis problems. They are usually carried out in settings where the structure of the network under consideration is known. Motivated by real-world prob- lems, this study presents several algorithms for scenarios where the network structure is partially or completely unknown. The objective of network representation learning is to identify a mapping function that projects sparse and high-dimensional network graphs into a dense latent representation, which preserves the original information about nodes and their neighborhoods. The notion of neighborhood, however, be- comes illusive when the network structure is partially or completely hidden. Inspired by previous results, in our thesis work we have developed novel algorithms that are resilient to such lack of knowledge. These results estab- lish a correlation between the properties of the network and different kind of node activities performed over it, information which is generally more available and can be easily observed. In particular, we focus on diffusion events â€“ also called cascades â€“ such as shares, retweets and hashtags. In the first of our contributions, we have developed a novel NRL algorithm called Mineral, a simple technique that combines the observed cascades with the partially accessible network structure by sampling artificial cas- cades. Node representation is then learned from the observed and sampled cascades by using the SkipGram model that is widely used for word representation learning in natural language documents. In our second contribution, called NetTensor, we assume that the network structure is completely hidden and we propose novel techniques that are capable to estimate both the hidden neighborhood (proximity) and the similarity of nodes. Such estimated values are then used to learn a unified embedding of nodes using a scalable truncated singular value decomposition and deep autoencoders. In addition to the NRL algorithms, we have also proposed a novel CRL algorithm called cas2vec for virality (popularity) prediction. Again, we pursue a network-agnostic approach following the above assumption that the network structure is completely unknown. Unlike prior studies that rely on manual feature extraction, cas2vec automatically learns cascade representations based on convolutional neural networks, that are effective in predicting virality of cascades. We have carried out extensive experiments using several real-world datasets for all of our methods and compared them against strong baselines from the state-of-the-art, achieving significantly better results than many of them.

Network and Cascade Representation Learning: Algorithms based on Information Diffusion Events / Kefato, Zekarias Tilahun. - (2019), pp. 1-154. [10.15168/11572_369013]

Network and Cascade Representation Learning: Algorithms based on Information Diffusion Events

Kefato, Zekarias Tilahun

2019-01-01

Abstract

Network representation learning (NRL) and cascade representation learn- ing (CRL) are fundamental backbones of different kinds of network analysis problems. They are usually carried out in settings where the structure of the network under consideration is known. Motivated by real-world prob- lems, this study presents several algorithms for scenarios where the network structure is partially or completely unknown. The objective of network representation learning is to identify a mapping function that projects sparse and high-dimensional network graphs into a dense latent representation, which preserves the original information about nodes and their neighborhoods. The notion of neighborhood, however, be- comes illusive when the network structure is partially or completely hidden. Inspired by previous results, in our thesis work we have developed novel algorithms that are resilient to such lack of knowledge. These results estab- lish a correlation between the properties of the network and different kind of node activities performed over it, information which is generally more available and can be easily observed. In particular, we focus on diffusion events â€“ also called cascades â€“ such as shares, retweets and hashtags. In the first of our contributions, we have developed a novel NRL algorithm called Mineral, a simple technique that combines the observed cascades with the partially accessible network structure by sampling artificial cas- cades. Node representation is then learned from the observed and sampled cascades by using the SkipGram model that is widely used for word representation learning in natural language documents. In our second contribution, called NetTensor, we assume that the network structure is completely hidden and we propose novel techniques that are capable to estimate both the hidden neighborhood (proximity) and the similarity of nodes. Such estimated values are then used to learn a unified embedding of nodes using a scalable truncated singular value decomposition and deep autoencoders. In addition to the NRL algorithms, we have also proposed a novel CRL algorithm called cas2vec for virality (popularity) prediction. Again, we pursue a network-agnostic approach following the above assumption that the network structure is completely unknown. Unlike prior studies that rely on manual feature extraction, cas2vec automatically learns cascade representations based on convolutional neural networks, that are effective in predicting virality of cascades. We have carried out extensive experiments using several real-world datasets for all of our methods and compared them against strong baselines from the state-of-the-art, achieving significantly better results than many of them.

Scheda breve

Scheda completa

Scheda completa (DC)

	Data di esame finale/Defended on
	
				2019
			
	Ciclo
	
				XXX
			
	Anno Accademico
	
				2018-2019
			
	Dipartimento
	
				Ingegneria e scienza dell'Informaz (29/10/12-)
			
	Corso di dottorato
	
				Informatica e telecomunicazioni (fino a.a. 2020-21, 36° ciclo)
			
	Supervisore/Relatore di tesi Unitn (Unitn internal supervisor)
	
				Montresor, Alberto
			
	Tesi in cotutela (Bi-nationally supervised Doctoral Thesis)
	
				no
			
	Codice DOI
	
				https://dx.doi.org/10.15168/11572_369013
			
	Lingua (Language)
	
				Inglese
			
	Settori scientifico-disciplinari (validi fino a 24/06/2024) - Reference SSD (valid until 24/06/2024)
	
				Settore INF/01 - Informatica
			
	Appare nelle tipologie:
	
				08.1 Tesi di dottorato (Doctoral Thesis)

File in questo prodotto:

File	Dimensione	Formato
Zekarias_TK_PhD_Thesis.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.72 MB Formato Adobe PDF Visualizza/Apri	1.72 MB	Adobe PDF	Visualizza/Apri
University_deposit_disclaimer_english_version.pdf Solo gestori archivio Tipologia: Tesi di dottorato (Doctoral Thesis) Licenza: Tutti i diritti riservati (All rights reserved) Dimensione 1.09 MB Formato Adobe PDF Visualizza/Apri	1.09 MB	Adobe PDF	Visualizza/Apri