Hongliu Cao

26 Arbeiten171 Zitationen

Relevante Arbeiten

Meistzitierte Publikationen im Bereich Gesundheit & MedTech

Multi-Agent LLM Judge: automatic personalized LLM judge design for evaluating natural language generation applications

2025 · 0 Zit. · ArXiv.org

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

2026 · 0 Zit. · arXiv (Cornell University)

Beyond Task Completion: Revealing Corrupt Success in LLM Agents through Procedure-Aware Evaluation

2026 · 0 Zit. · arXiv (Cornell University)