Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

2025·0 Zitationen·ArXiv.orgOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

Electronic health records (EHRs) contain richly structured, longitudinal data essential for predictive modeling, yet stringent privacy regulations (e.g., HIPAA, GDPR) often restrict access to individual-level records. We introduce \textbf{Query, Don't Train} (QDT): a \textbf{structured-data foundation-model interface} enabling \textbf{tabular inference} via LLM-generated SQL over EHRs. Instead of training on or accessing individual-level examples, QDT uses a large language model (LLM) as a schema-aware query planner to generate privacy-compliant SQL queries from a natural language task description and a test-time input. The model then extracts summary-level population statistics through these SQL queries, and the LLM performs chain-of-thought reasoning over the results to make predictions. This inference-time-only approach enables prediction without supervised model training, ensures interpretability through symbolic, auditable queries, naturally handles missing features without imputation or preprocessing, and effectively manages high-dimensional numerical data to enhance analytical capabilities. We validate QDT on the task of 30-day hospital readmission prediction for Type 2 diabetes patients using a MIMIC-style EHR cohort, achieving F1 = 0.70, which outperforms TabPFN (F1 = 0.68). To our knowledge, this is the first demonstration of LLM-driven, privacy-preserving structured prediction using only schema metadata and aggregate statistics -- offering a scalable, interpretable, and regulation-compliant alternative to conventional foundation-model pipelines.

Autoren

Themen

Machine Learning in HealthcarePrivacy-Preserving Technologies in DataArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Query, Don't Train: Privacy-Preserving Tabular Prediction from EHR Data via SQL Queries

Abstract

Ähnliche Arbeiten

Autoren

Themen