Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Validating a strategy for psychosocial phenotyping using a large corpus of clinical text
30
Zitationen
7
Autoren
2013
Jahr
Abstract
OBJECTIVE: To develop algorithms to improve efficiency of patient phenotyping using natural language processing (NLP) on text data. Of a large number of note titles available in our database, we sought to determine those with highest yield and precision for psychosocial concepts. MATERIALS AND METHODS: From a database of over 1 billion documents from US Department of Veterans Affairs medical facilities, a random sample of 1500 documents from each of 218 enterprise note titles were chosen. Psychosocial concepts were extracted using a UIMA-AS-based NLP pipeline (v3NLP), using a lexicon of relevant concepts with negation and template format annotators. Human reviewers evaluated a subset of documents for false positives and sensitivity. High-yield documents were identified by hit rate and precision. Reasons for false positivity were characterized. RESULTS: A total of 58 707 psychosocial concepts were identified from 316 355 documents for an overall hit rate of 0.2 concepts per document (median 0.1, range 1.6-0). Of 6031 concepts reviewed from a high-yield set of note titles, the overall precision for all concept categories was 80%, with variability among note titles and concept categories. Reasons for false positivity included templating, negation, context, and alternate meaning of words. The sensitivity of the NLP system was noted to be 49% (95% CI 43% to 55%). CONCLUSIONS: Phenotyping using NLP need not involve the entire document corpus. Our methods offer a generalizable strategy for scaling NLP pipelines to large free text corpora with complex linguistic annotations in attempts to identify patients of a certain phenotype.
Ähnliche Arbeiten
The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods
2009 · 5.729 Zit.
The Stress Process
1981 · 4.493 Zit.
Mental health problems and social media exposure during COVID-19 outbreak
2020 · 2.797 Zit.
Cross-national prevalence and risk factors for suicidal ideation, plans and attempts
2008 · 2.637 Zit.
Psychological Aspects of Natural Language Use: Our Words, Our Selves
2002 · 2.568 Zit.