RogueGPT: Transforming ChatGPT-4 into a Rogue AI with Dis-Ethical Tuning

Buscemi, Alessio; Proverbio, Daniele

doi:10.1007/s43681-025-00750-4

The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.

RogueGPT: Transforming ChatGPT-4 into a Rogue AI with Dis-Ethical Tuning / Buscemi, Alessio; Proverbio, Daniele. - In: AI AND ETHICS. - ISSN 2730-5961. - 2025, 5:(2025), pp. 4945-4966. [10.1007/s43681-025-00750-4]

RogueGPT: Transforming ChatGPT-4 into a Rogue AI with Dis-Ethical Tuning

Buscemi, Alessio;Proverbio, Daniele

2025-01-01

Abstract

The ethical implications and potentials for misuse of Generative Artificial Intelligence are increasingly worrying topics. This paper explores how easily the default ethical guardrails of ChatGPT, using its latest customization features, can be bypassed by simple prompts and fine-tuning, that can be effortlessly accessed by the broad public. This malevolently altered version of ChatGPT, nicknamed “RogueGPT”, responded with worrying behaviours, beyond those triggered by jailbreak prompts. We conduct an empirical study of RogueGPT responses, assessing its flexibility in answering questions pertaining to what should be disallowed usage. Our findings raise significant concerns about the model’s knowledge about topics like illegal drug production, torture methods and terrorism. The ease of driving ChatGPT astray, coupled with its global accessibility, highlights severe issues regarding the data quality used for training the foundational model and the implementation of ethical safeguards. We thus underline the responsibilities and dangers of user-driven modifications, and the broader effects that these may have on the design of safeguarding and ethical modules implemented by AI programmers. Disclaimer. This paper contains examples of harmful language. Reader discretion is recommended.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione (Date of publication)
	
				2025
			
	Titolo del periodico (Journal title)
	
				AI AND ETHICS
			
	DOI
	
				https://dx.doi.org/10.1007/s43681-025-00750-4
			
	Tutti gli autori
	
						Buscemi, Alessio; Proverbio, Daniele
					
	Citazione
	
				RogueGPT: Transforming ChatGPT-4 into a Rogue AI with Dis-Ethical Tuning / Buscemi, Alessio; Proverbio, Daniele. - In: AI AND ETHICS. - ISSN 2730-5961. - 2025, 5:(2025), pp. 4945-4966. [10.1007/s43681-025-00750-4]
			
	Appare nelle tipologie:
	
				03.1 Articolo su rivista (Journal article)

File in questo prodotto:

File	Dimensione	Formato
s43681-025-00750-4 (1).pdf accesso aperto Descrizione: AI and Ethics (2025) 5:4945–4966 - original research Tipologia: Versione editoriale (Publisher’s layout) Licenza: Creative commons Dimensione 3.66 MB Formato Adobe PDF Visualizza/Apri	3.66 MB	Adobe PDF	Visualizza/Apri