Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Bridging Large Language Models and Single-Cell Transcriptomics in Dissecting Selective Motor Neuron Vulnerability.
0
Zitationen
6
Autoren
2025
Jahr
Abstract
Understanding cell identity and function through single-cell level sequencing data remains a key challenge in computational biology. We present a novel framework that leverages gene-specific textual annotations from the NCBI Gene database to generate biologically contextualized cell embeddings. For each cell in a single-cell RNA sequencing (scRNA-seq) dataset, we rank genes by expression level, retrieve their corresponding NCBI gene descriptions, and transform these descriptions into vector embedding representations using large language models (LLMs). The models used include OpenAI's text-embedding-ada-002, textembedding-3-small and text-embedding-3-large (Jan 2024), as well as domain-specific models BioBERT and SciBERT. Embeddings are computed via an expression-weighted average across the top-N most highly expressed genes in each cell, providing a compact, semantically rich representation. This multimodal strategy bridges structured biological data with state-of-the-art language modeling, enabling more interpretable downstream applications such as cell type clustering, cell vulnerability dissection, and trajectory inference.
Ähnliche Arbeiten
Model-based Analysis of ChIP-Seq (MACS)
2008 · 19.474 Zit.
Simple Combinations of Lineage-Determining Transcription Factors Prime cis-Regulatory Elements Required for Macrophage and B Cell Identities
2010 · 14.243 Zit.
A human gut microbial gene catalogue established by metagenomic sequencing
2010 · 11.480 Zit.
Developing and evaluating complex interventions: the new Medical Research Council guidance
2008 · 11.264 Zit.
Chromatin Modifications and Their Function
2007 · 10.662 Zit.