Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Interactive LLMs with Human-in-the-Loop for Ethical Content Generation
0
Zitationen
3
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) have shown impressive breakthroughs in reasoning and fluency, yet face the risks of ethics, including bias, misinformation, and even toxicity. HITL is an effective step to achieving human compatibility between model behavior and human values. This article includes a literature review of HITL alignment methods: Reinforcement Learning from Human Feedback (RLHF), Direct Preference Optimization (DPO), Constitutional AI (CAI), and Reinforcement Learning from AI Feedback (RLAIF), as well as proposes an Interactive Alignment Lifecycle that incorporates human supervision along the LLM development pipeline. Combining research by 2020-2025, the results reveal that as much as 71% of harmful outputs and 50% of beneficial content can be reduced or increased accordingly with interactive feedback. But evaluator bias, scalability, and over-refusal are still problems. Future studies need to focus on optimal hybrid feedback, evolving constitutions, and standardized governance frameworks of sustainable ethical alignment.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.576 Zit.
Generative Adversarial Nets
2023 · 19.892 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.300 Zit.
"Why Should I Trust You?"
2016 · 14.396 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.164 Zit.