The Web has evolved from a handful of static web pages to billions of dynamic and interactive web pages. This evolution has positively transformed the paradigm of communication, trading, and collaboration for the benefit of humanity. However, these invaluable benefits of the Web are shadowed by cyber-criminals who use the Web as a medium to perform malicious activities motivated by illegitimate benefits. Cyber-criminals often lure victims to visit malicious web pages, exploit vulnerabilities on victims’ devices, and then launch attacks that could lead to: stealing invaluable credentials of victims, downloading and installation of malware on victims’ devices, or complete compromise of victims’ devices to mount future attacks. While the current state-of-the-art is to detect malicious web pages is promising, it is yet limited in addressing the following three problems. First, for the sake of focused detection of certain class of malicious web pages, existing techniques are limited to partial analysis and characterization of attack payloads. Secondly, attacker-motivated and benign evolution of web page artifacts have challenged the resilience of existing detection techniques. The third problem is the prevalence and evolution of Exploit Kits used in spreading web-borne malware. In this dissertation, we present the approaches and the tools we developed to address these problems. To the address partial analysis and characterization of attack payloads, we propose a holistic and lightweight approach that combines static analysis and minimalistic emulation to analyze and detect malicious web pages. This approach leverages features from URL structure, HTML content, JavaScript executed on the client, and reputation of URLs on social networking websites to train multiple models, which are then used in confidence-weighted majority vote classifier to detect unknown web pages. Evaluation of the approach on a large corpus of web pages shows that the approach not only is precise enough in detecting malicious web pages with very low false signals but also does detection with a minimal performance penalty. To address the evolution of web page artifacts, we propose an evolution-aware approach that tunes detection models inline with the evolution of web page artifacts. Our approach takes advantage of evolutionary searching and optimization using Genetic Algorithm to decide the best combination of features and learning algorithms, i.e., models, as a function of detection accuracy and false signals. Evaluation of our approach suggests that it reduces false negatives by about 10% on a fairly large testing corpus of web pages. To tackle the prevalence of Exploit Kits on the Web, we first analyze source code and runtime behavior of several Exploit Kits in a contained setting. In addition, we analyze the behavior of live Exploit Kits on the Web in a contained environment. Combining the analysis results, we characterize Exploit Kits pertinent to their attack-centric and self-defense behaviors. Based on these behaviors, we draw distinguishing features to train classifiers used to detect URLs that are hosted by Exploit Kits. The evaluation of our classifiers on independent testing dataset shows that our approach is effective in precisely detecting malicious URLs linked with Exploit Kits with very low false positives.

Effective Analysis, Characterization, and Detection of Malicious Activities on the Web / Eshete, Birhanu Mekuria. - (2013), pp. 1-172.

Effective Analysis, Characterization, and Detection of Malicious Activities on the Web

Eshete, Birhanu Mekuria
2013-01-01

Abstract

The Web has evolved from a handful of static web pages to billions of dynamic and interactive web pages. This evolution has positively transformed the paradigm of communication, trading, and collaboration for the benefit of humanity. However, these invaluable benefits of the Web are shadowed by cyber-criminals who use the Web as a medium to perform malicious activities motivated by illegitimate benefits. Cyber-criminals often lure victims to visit malicious web pages, exploit vulnerabilities on victims’ devices, and then launch attacks that could lead to: stealing invaluable credentials of victims, downloading and installation of malware on victims’ devices, or complete compromise of victims’ devices to mount future attacks. While the current state-of-the-art is to detect malicious web pages is promising, it is yet limited in addressing the following three problems. First, for the sake of focused detection of certain class of malicious web pages, existing techniques are limited to partial analysis and characterization of attack payloads. Secondly, attacker-motivated and benign evolution of web page artifacts have challenged the resilience of existing detection techniques. The third problem is the prevalence and evolution of Exploit Kits used in spreading web-borne malware. In this dissertation, we present the approaches and the tools we developed to address these problems. To the address partial analysis and characterization of attack payloads, we propose a holistic and lightweight approach that combines static analysis and minimalistic emulation to analyze and detect malicious web pages. This approach leverages features from URL structure, HTML content, JavaScript executed on the client, and reputation of URLs on social networking websites to train multiple models, which are then used in confidence-weighted majority vote classifier to detect unknown web pages. Evaluation of the approach on a large corpus of web pages shows that the approach not only is precise enough in detecting malicious web pages with very low false signals but also does detection with a minimal performance penalty. To address the evolution of web page artifacts, we propose an evolution-aware approach that tunes detection models inline with the evolution of web page artifacts. Our approach takes advantage of evolutionary searching and optimization using Genetic Algorithm to decide the best combination of features and learning algorithms, i.e., models, as a function of detection accuracy and false signals. Evaluation of our approach suggests that it reduces false negatives by about 10% on a fairly large testing corpus of web pages. To tackle the prevalence of Exploit Kits on the Web, we first analyze source code and runtime behavior of several Exploit Kits in a contained setting. In addition, we analyze the behavior of live Exploit Kits on the Web in a contained environment. Combining the analysis results, we characterize Exploit Kits pertinent to their attack-centric and self-defense behaviors. Based on these behaviors, we draw distinguishing features to train classifiers used to detect URLs that are hosted by Exploit Kits. The evaluation of our classifiers on independent testing dataset shows that our approach is effective in precisely detecting malicious URLs linked with Exploit Kits with very low false positives.
2013
XXVI
2012-2013
Ingegneria e scienza dell'Informaz (29/10/12-)
Information and Communication Technology
Villafiorita, Adolfo
no
Inglese
Settore INF/01 - Informatica
File in questo prodotto:
File Dimensione Formato  
PhD_Dissertation_Birhanu_ESHETE.pdf

Solo gestori archivio

Tipologia: Tesi di dottorato (Doctoral Thesis)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 3.8 MB
Formato Adobe PDF
3.8 MB Adobe PDF   Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/368500
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact