Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings
0
Zitationen
6
Autoren
2026
Jahr
Abstract
Abstract Generating context-aware, high-quality, and innovative scientific ideas remains a central challenge in AI-supported research. We introduce SCI-IDEA , a two-stage framework combining large language models (LLMs) with a specialised Aha-Moment Detection module for iterative idea refinement. The first stage extracts structured facets: objectives, methodology, evaluation, and future work from previous publications, compressing each paper to $${\sim }$$ 200 tokens to enable scalable processing of researcher profiles that would otherwise exceed frontier LLM context windows. The second stage integrates these facets with a token-level embedding approach to identify novelty and context gaps, indicating candidate ideas as Aha moments when they surpass predefined thresholds for both novelty and surprise. We evaluate SCI-IDEA across 100 computer-science researcher profiles, 4 LLMs (GPT-4o, GPT−4.5, DeepSeek-32B, DeepSeek-70B), three embedding strategies, and 5 prompting configurations via a hybrid protocol of automated LLM-as-judge scoring with GPT−4.1 and a human evaluation with 15 PhD-level domain experts. These two evaluation signals diverge substantially: LLM-based scores exceed expert ratings by 3–4 points on a 10-point scale, with near-zero inter-rater correlation ( $$r = 0.02$$ –0.17, $$p > 0.05$$ ), indicating that automated scores reflect relative architectural comparisons rather than human-verified measures of absolute idea quality. Within the LLM-evaluated setting, SCI-IDEA with token-level embeddings achieves a mean quality score of 6.97, a modest but constant improvement over the LLM-only baseline ( $$\Delta = +0.20$$ points; $$p < 0.05$$ ) and driven primarily by improved feasibility scores ( $$+0.21$$ points, $$p < 0.001$$ ). Expert evaluators independently confirmed this trend while assigning lower absolute scores. These findings suggest an architectural advantage of facet-based context modelling and token-level novelty detection, though their practical importance deserves validation through longitudinal studies with larger domain experts. This work is validated in the computer science domain, but its application to other fields requires further investigation.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.561 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.452 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.948 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.797 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.