OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 26.03.2026, 12:40

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

HEV Generative Sandbox: A Framework for Assessing Domain-Specific Social Risks Through Human-LLM Simulation

2026·0 Zitationen·Proceedings of the AAAI Conference on Artificial IntelligenceOpen Access
Volltext beim Verlag öffnen

0

Zitationen

8

Autoren

2026

Jahr

Abstract

Deploying Large Language Models (LLMs) in specialized domains introduces significant societal and compliance risks, including bias amplification, misinformation propagation, and privacy violations. These risks predominantly emerge from the dynamic interactions between LLMs and humans in specific contexts. Different domains face unique distribution of hazards, and varying interaction modalities introduce distinct levels of exposure and vulnerability. However, current risk assessment frameworks lack a systematic methodology to capture this dynamic interplay. In this work, we introduce the HEV Generative Sandbox, a novel risk evaluation framework that simulates human-LLM behavior to quantify domain-contextual risks across three interdependent dimensions: 1) Hazard (H): Domain-specific threats inherent to a given context; 2) Exposure (E): The extent to which the LLM and its users are subjected to hazardous scenarios; 3) Vulnerability (V): The susceptibility of the system to risk due to human interaction or model weaknesses. Our approach pioneers "domain-rooted scenario generation", wherein we sample contextual distributions from domain-specific corpora and simulate diverse inputs. By unifying dynamic scenario simulation, causal risk decomposition, and closed-loop evaluation, the HEV Generative Sandbox provides a scalable, domain-sensitive methodology for responsible LLM deployment. This work contributes to advancing the safe deployment of LLMs by providing a comprehensive and automated risk evaluation framework.

Ähnliche Arbeiten