OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.05.2026, 18:19

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

SCI-IDEA: Context-Aware Scientific Ideation Using Token and Sentence Embeddings

2026·0 Zitationen·Machine LearningOpen Access
Volltext beim Verlag öffnen

0

Zitationen

6

Autoren

2026

Jahr

Abstract

Abstract Generating context-aware, high-quality, and innovative scientific ideas remains a central challenge in AI-supported research. We introduce SCI-IDEA , a two-stage framework combining large language models (LLMs) with a specialised Aha-Moment Detection module for iterative idea refinement. The first stage extracts structured facets: objectives, methodology, evaluation, and future work from previous publications, compressing each paper to $${\sim }$$ 200 tokens to enable scalable processing of researcher profiles that would otherwise exceed frontier LLM context windows. The second stage integrates these facets with a token-level embedding approach to identify novelty and context gaps, indicating candidate ideas as Aha moments when they surpass predefined thresholds for both novelty and surprise. We evaluate SCI-IDEA across 100 computer-science researcher profiles, 4 LLMs (GPT-4o, GPT−4.5, DeepSeek-32B, DeepSeek-70B), three embedding strategies, and 5 prompting configurations via a hybrid protocol of automated LLM-as-judge scoring with GPT−4.1 and a human evaluation with 15 PhD-level domain experts. These two evaluation signals diverge substantially: LLM-based scores exceed expert ratings by 3–4 points on a 10-point scale, with near-zero inter-rater correlation ( $$r = 0.02$$ –0.17, $$p > 0.05$$ ), indicating that automated scores reflect relative architectural comparisons rather than human-verified measures of absolute idea quality. Within the LLM-evaluated setting, SCI-IDEA with token-level embeddings achieves a mean quality score of 6.97, a modest but constant improvement over the LLM-only baseline ( $$\Delta = +0.20$$ points; $$p < 0.05$$ ) and driven primarily by improved feasibility scores ( $$+0.21$$ points, $$p < 0.001$$ ). Expert evaluators independently confirmed this trend while assigning lower absolute scores. These findings suggest an architectural advantage of facet-based context modelling and token-level novelty detection, though their practical importance deserves validation through longitudinal studies with larger domain experts. This work is validated in the computer science domain, but its application to other fields requires further investigation.

Ähnliche Arbeiten