OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.03.2026, 00:45

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

The Unprecedented Surge in Generative AI: Empirical Analysis of Trusted and Malicious Large Language Models (LLMs)

2025·0 Zitationen·IEEE Technology and Society Magazine
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2025

Jahr

Abstract

Trusted large language models (LLMs) inherit ethical guidelines to prevent generating harmful content, whereas malicious LLMs are engineered to enable the generation of unethical and toxic responses. Both trusted and malicious LLMs use guardrails in differential contexts per the requirements of the developers and attackers, respectively. We explore the multifaceted world of guardrails implementation in LLMs by conducting an empirical analysis to assess the effectiveness of guardrails using prompts. Our results revealed that guardrails deployed in the trusted LLMs could be bypassed using prompt manipulation techniques such as "pretend" and "persist" to generate harmful content. In addition, we also discovered that malicious LLMs still deploy weak guardrails to evade detection by generating human-like content. This empirical analysis provides insights into the design of the malicious and trusted LLMs. We also propose recommendations to defend against prompt manipulation and guardrails bypass while designing LLMs.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Privacy-Preserving Technologies in DataTopic ModelingArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen