Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Explainable Feature Engineering in Health Data Science: Empirical Comparison of ChatGPT-4o and Classical Machine Learning Methods
1
Zitationen
8
Autoren
2025
Jahr
Abstract
Machine learning (ML) is demonstrating remarkable success in various healthcare applications. The success of ML in healthcare is inherently linked to the rigorous process of feature engineering and feature selection, which truly forms the backbone of ML model development. This study investigates the role of a well-known large language model (LLM), the ChatGPT-4o, in feature selection and classification processes for healthcare data, focusing on the explainability of ML. The performance of ChatGPT-4o is evaluated and compared to traditional ML methods—such as information gain (IG), correlation-based feature selection (CFS), and principal component analysis (PCA) for identifying relevant features in predictive modeling. This comparison is conducted using two widely recognized healthcare datasets, SEER and NSQIP. After evaluating the features selected by classical ML methods and LLMs through expert review, the results indicate that while ChatGPT-4o aligns closely with expert evaluations and effectively provides contextual information on healthcare datasets, traditional ML methods such as IG, CFS, and PCA outperform in systematic feature ranking due to their structured and data-driven nature. Furthermore, anonymization did not significantly affect the feature selection process, highlighting the robustness of ChatGPT-4o under privacy-preserving conditions. ChatGPT-4o's strength lies in complementing these methods by providing interpretability and facilitating exploratory analysis, rather than serving as a standalone solution for precise feature ranking.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.292 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.143 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.539 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.452 Zit.