Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Language evolution in the era of digital natural language processing: bridging human-machine understanding in medicine
0
Zitationen
1
Autoren
2026
Jahr
Abstract
As the interactions between human language and Artificial Intelligence (AI) evolve within the clinical domain, several theoretical and practical questions emerge. This thesis addresses two critical areas: first, the nature of human linguistic adaptation to AI systems and its implications for language theory; and second, foundational challenges in medical Natural Language Processing (NLP), ranging from multilingual resource scarcity to the fundamental assumptions underlying model specialization. The first part explores human-machine linguistic adaptation. Using evidence from a large-scale retrospective observational study on clinician interactions with a phrase-prediction tool and a literature study of prompt engineering techniques for Large Language Models (LLM), this part demonstrates that users consistently adapt their linguistic inputs to optimize for communicative accuracy. A central finding is the re-framing of Zipf’s principle of least effort, where users invest greater linguistic effort (e.g., more distinctive terms or verbose prompts) to reduce ambiguity and ensure predictable outcomes, prioritizing cognitive clarity over physical brevity. This behavior provides empirical support for a paradigm shift in NLP, moving beyond the purely distributional hypothesis toward a framework that incorporates communicative intent. The second part addresses practical and foundational challenges in medical NLP. It begins by tackling resource scarcity, introducing a novel cross-lingual annotation projection pipeline as a scalable solution. A notable contribution is the creation of FRASIMED, a large open-source French clinical corpus with annotations semantically grounded in medical knowledge bases, including the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT). Evaluations using this resource led to an unexpected finding: general-purpose language models consistently outperformed their medically specialized counterparts on French clinical Named Entity Recognition (NER). This analysis was facilitated by the development of Bratly, a robust Python library designed to manage annotated datasets and automate robust NER entity-level evaluation. These results prompted a deeper investigation into the semantic capabilities of these models. Through knowledge-based benchmarks, this thesis reveals that some state-of-the-art generalist encoders possess a powerful latent ability to represent the complex, hierarchical structure of the SNOMED CT ontology, surpassing even models explicitly trained on that medical knowledge base. This challenges the assumption that domain-specific models are always the optimal choice, suggesting that the transferability of representations in generalist models can provide a more robust semantic foundation for certain tasks. Together, the contributions presented in this work advance the field of medical NLP on both theoretical and practical fronts. By providing a tangible solution for resource creation, a new theoretical lens on human-AI interaction, and a re-evaluation of model specialization, this research supports the development of more effective medical AI systems through a more nuanced understanding of both human communicative intent and the latent capabilities of modern language architectures.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.260 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.116 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.493 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.776 Zit.
Peeking Inside the Black-Box: A Survey on Explainable Artificial Intelligence (XAI)
2018 · 5.438 Zit.