Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Commonsense Reasoning 2.0: Symbolic Approaches to Close the GPT-4 Gap
0
Zitationen
5
Autoren
2025
Jahr
Abstract
Large Language Models (LLMs) like GPT-4 excel at text generation but struggle with commonsense reasoning, often producing logically flawed or unsafe responses (e.g., "water is dry" or "bleach cures infection"). This paper introduces Commonsense Reasoning 2.0, a neuro-symbolic framework that integrates LLMs with structured knowledge bases (KBs) to enhance verifiability and explainability. The framework combines three components: knowledge grounding (anchoring outputs in axioms such as physics laws), constraint-based generation (eliminating infeasible inferences via logic rules), and interactive refinement (dynamically updating KBs through human-AI feedback). We address GPT-4's limitations in causal, temporal, and social reasoning and demonstrate how symbolic KBs such as Cyc and ConceptNet enforce logical constraints. The framework navigates scalability and interpretability trade-offs through modular design and LLM-driven rule synthesis. In contrast to rigid symbolic systems or purely statistical LLMs, our approach maintains generative flexibility while enhancing safety and transparency. We present a taxonomy of GPT-4 commonsense failures and a design blueprint for scalable hybrid systems. A qualitative evaluation using examples from CommonsenseQA and HellaSwag shows the system’s ability to reject flawed outputs. Plans for future quantitative benchmarking are outlined. We conclude that symbolic reasoning must be embedded into AI systems to move beyond surface-level fluency toward trustworthy understanding.
Ähnliche Arbeiten
Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization
2017 · 20.311 Zit.
Generative Adversarial Nets
2023 · 19.841 Zit.
Visualizing and Understanding Convolutional Networks
2014 · 15.238 Zit.
"Why Should I Trust You?"
2016 · 14.210 Zit.
On a Method to Measure Supervised Multiclass Model’s Interpretability: Application to Degradation Diagnosis (Short Paper)
2024 · 13.104 Zit.