Subdomain enumeration is a fundamental step of many security processes (i.e., vulnerability discovery, OSINT, host enumeration, etc.). Up to now, this has been achieved with deterministic procedures that have shown some limitations. For instance, the process typically requires the generation of a candidate, which is subsequently checked for validity. While the validation is a straightforward procedure, the definition of an optimal candidate generation strategy is still an open problem. This paper presents a novel subdomain enumeration tool that allows the generation of high-quality subdomain candidates. We employ a Generative Adversarial Network (GAN) to sample unseen candidates from the distribution of valid subdomain names. The model learns this distribution from publicly available datasets. Moreover, by sampling from the trained model, we address the limitations of traditional algorithms. Our experiments were carried out against 15 domains and a ground truth of 1164 other targets. The 15 domains were carefully selected from bug bounty platforms to avoid terms of use violations. Several factors influenced the choices, including the popularity, the expected number of subdomains, and the available services. Our experiments aim to validate our approach by testing the performance increase in subdomain enumeration processes against the state-of-the-art. We benchmark our proposal in terms of candidates’ validity and sample uniqueness. The results showed that, with our GAN, the performance of a traditional subdomain enumeration workflow increased by up to 61%. In addition, according to our ground truth experiments, the GAN was able to guess, on average, 32% of subdomains.

Generative adversarial networks for subdomain enumeration / Degani, L.; Bergadano, F.; Mirheidari, S. A.; Martinelli, F.; Crispo, B.. - (2022), pp. 1636-1645. (Intervento presentato al convegno The 37th ACM/SIGAPP Symposium On Applied Computing tenutosi a Virtual nel 25 - 29 April, 2022) [10.1145/3477314.3506967].

Generative adversarial networks for subdomain enumeration

Degani L.;Mirheidari S. A.;Crispo B.
2022-01-01

Abstract

Subdomain enumeration is a fundamental step of many security processes (i.e., vulnerability discovery, OSINT, host enumeration, etc.). Up to now, this has been achieved with deterministic procedures that have shown some limitations. For instance, the process typically requires the generation of a candidate, which is subsequently checked for validity. While the validation is a straightforward procedure, the definition of an optimal candidate generation strategy is still an open problem. This paper presents a novel subdomain enumeration tool that allows the generation of high-quality subdomain candidates. We employ a Generative Adversarial Network (GAN) to sample unseen candidates from the distribution of valid subdomain names. The model learns this distribution from publicly available datasets. Moreover, by sampling from the trained model, we address the limitations of traditional algorithms. Our experiments were carried out against 15 domains and a ground truth of 1164 other targets. The 15 domains were carefully selected from bug bounty platforms to avoid terms of use violations. Several factors influenced the choices, including the popularity, the expected number of subdomains, and the available services. Our experiments aim to validate our approach by testing the performance increase in subdomain enumeration processes against the state-of-the-art. We benchmark our proposal in terms of candidates’ validity and sample uniqueness. The results showed that, with our GAN, the performance of a traditional subdomain enumeration workflow increased by up to 61%. In addition, according to our ground truth experiments, the GAN was able to guess, on average, 32% of subdomains.
2022
Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
New York, USA
Association for Computing Machinery
9781450387132
Degani, L.; Bergadano, F.; Mirheidari, S. A.; Martinelli, F.; Crispo, B.
Generative adversarial networks for subdomain enumeration / Degani, L.; Bergadano, F.; Mirheidari, S. A.; Martinelli, F.; Crispo, B.. - (2022), pp. 1636-1645. (Intervento presentato al convegno The 37th ACM/SIGAPP Symposium On Applied Computing tenutosi a Virtual nel 25 - 29 April, 2022) [10.1145/3477314.3506967].
File in questo prodotto:
File Dimensione Formato  
3477314.3506967.pdf

accesso aperto

Descrizione: Paper
Tipologia: Versione editoriale (Publisher’s layout)
Licenza: Tutti i diritti riservati (All rights reserved)
Dimensione 1.2 MB
Formato Adobe PDF
1.2 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11572/353521
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact