Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Language evolution in the era of digital natural language processing: bridging human-machine understanding in medicine

2026·0 Zitationen·Archive ouverte UNIGE (University of Geneva)Open Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

As the interactions between human language and Artificial Intelligence (AI) evolve within the clinical domain, several theoretical and practical questions emerge. This thesis addresses two critical areas: first, the nature of human linguistic adaptation to AI systems and its implications for language theory; and second, foundational challenges in medical Natural Language Processing (NLP), ranging from multilingual resource scarcity to the fundamental assumptions underlying model specialization. The first part explores human-machine linguistic adaptation. Using evidence from a large-scale retrospective observational study on clinician interactions with a phrase-prediction tool and a literature study of prompt engineering techniques for Large Language Models (LLM), this part demonstrates that users consistently adapt their linguistic inputs to optimize for communicative accuracy. A central finding is the re-framing of Zipf’s principle of least effort, where users invest greater linguistic effort (e.g., more distinctive terms or verbose prompts) to reduce ambiguity and ensure predictable outcomes, prioritizing cognitive clarity over physical brevity. This behavior provides empirical support for a paradigm shift in NLP, moving beyond the purely distributional hypothesis toward a framework that incorporates communicative intent. The second part addresses practical and foundational challenges in medical NLP. It begins by tackling resource scarcity, introducing a novel cross-lingual annotation projection pipeline as a scalable solution. A notable contribution is the creation of FRASIMED, a large open-source French clinical corpus with annotations semantically grounded in medical knowledge bases, including the Systematized Nomenclature of Medicine – Clinical Terms (SNOMED CT). Evaluations using this resource led to an unexpected finding: general-purpose language models consistently outperformed their medically specialized counterparts on French clinical Named Entity Recognition (NER). This analysis was facilitated by the development of Bratly, a robust Python library designed to manage annotated datasets and automate robust NER entity-level evaluation. These results prompted a deeper investigation into the semantic capabilities of these models. Through knowledge-based benchmarks, this thesis reveals that some state-of-the-art generalist encoders possess a powerful latent ability to represent the complex, hierarchical structure of the SNOMED CT ontology, surpassing even models explicitly trained on that medical knowledge base. This challenges the assumption that domain-specific models are always the optimal choice, suggesting that the transferability of representations in generalist models can provide a more robust semantic foundation for certain tasks. Together, the contributions presented in this work advance the field of medical NLP on both theoretical and practical fronts. By providing a tangible solution for resource creation, a new theoretical lens on human-AI interaction, and a re-evaluation of model specialization, this research supports the development of more effective medical AI systems through a more nuanced understanding of both human communicative intent and the latent capabilities of modern language architectures.

Autoren

Jamil Zaghir

Themen

Artificial Intelligence in Healthcare and EducationMachine Learning in HealthcareLanguage and cultural evolution

Volltext beim Verlag öffnen

Language evolution in the era of digital natural language processing: bridging human-machine understanding in medicine

Abstract

Ähnliche Arbeiten

Autoren

Themen