Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Misuse of large language models: Exploiting weaknesses for target-specific outputs
1
Zitationen
1
Autoren
2024
Jahr
Abstract
Prompt engineering in large language models (LLMs) in combination with external context can be misused for jailbreaks in order to generate malicious outputs. In the process, jailbreak prompts are apparently amplified in such a way that LLMs can generate malicious outputs on a large scale despite their initial training. As social bots, these can contribute to the dissemination of misinformation, hate speech, and discriminatory content. Using GPT4-x-Vicuna-13b-4bit from NousResearch, we demonstrate in this article the effectiveness of jailbreak prompts and external contexts via Jupyter Notebook based on the Python programming language. In addition, we highlight the methodological foundations of prompt engineering and its potential to create malicious content in order to sensitize researchers, practitioners, and policymakers to the importance of responsible development and deployment of LLMs.
Ähnliche Arbeiten
Excitable Speech: A Politics of the Performative
1997 · 6.342 Zit.
Misinformation and Its Correction
2012 · 2.717 Zit.
Following you home from school: A critical review and synthesis of research on cyberbullying victimization
2010 · 2.417 Zit.
Automated Hate Speech Detection and the Problem of Offensive Language
2017 · 2.392 Zit.
Methods of coping with social desirability bias: A review
1985 · 2.236 Zit.