OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 08.04.2026, 11:37

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Interactive LLMs with Human-in-the-Loop for Ethical Content Generation

2025·0 Zitationen
Volltext beim Verlag öffnen

0

Zitationen

3

Autoren

2025

Jahr

Abstract

Large Language Models (LLMs) have shown impressive breakthroughs in reasoning and fluency, yet face the risks of ethics, including bias, misinformation, and even toxicity. HITL is an effective step to achieving human compatibility between model behavior and human values. This article includes a literature review of HITL alignment methods: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), Constitutional AI (CAI), and Reinforcement Learning from AI Feedback (RLAIF), as well as proposes an Interactive Alignment Lifecycle that incorporates human supervision along the LLM development pipeline. Combining research by 2020-2025, the results reveal that as much as 71% of harmful outputs and 50% of beneficial content can be reduced or increased accordingly with interactive feedback. But evaluator bias, scalability, and over-refusal are still problems. Future studies need to focus on optimal hybrid feedback, evolving constitutions, and standardized governance frameworks of sustainable ethical alignment.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Explainable Artificial Intelligence (XAI)Ethics and Social Impacts of AIArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen