Koleck, Theresa A.

Identifying Symptom Information in Clinical Notes Using Natural Language Processing

Background

Symptoms are a core concept of nursing interest. Large-scale secondary data reuse of notes in electronic health records (EHRs) has the potential to increase the quantity and quality of symptom research. However, the symptom language used in clinical notes is complex. A need exists for methods designed specifically to identify and study symptom information from EHR notes.

Objectives

We aim to describe a method that combines standardized vocabularies, clinical expertise, and natural language processing to generate comprehensive symptom vocabularies and identify symptom information in EHR notes. We piloted this method with five diverse symptom concepts: constipation, depressed mood, disturbed sleep, fatigue, and palpitations.

Methods

First, we obtained synonym lists for each pilot symptom concept from the Unified Medical Language System. Then, we used two large bodies of text (clinical notes from Columbia University Irving Medical Center and PubMed abstracts containing Medical Subject Headings or key words related to the pilot symptoms) to further expand our initial vocabulary of synonyms for each pilot symptom concept. We used NimbleMiner, an open-source natural language processing tool, to accomplish these tasks and evaluated NimbleMiner symptom identification performance by comparison to a manually annotated set of nurse- and physician-authored common EHR note types.

Results

Compared to the baseline Unified Medical Language System synonym lists, we identified up to 11 times more additional synonym words or expressions, including abbreviations, misspellings, and unique multiword combinations, for each symptom concept. Natural language processing system symptom identification performance was excellent.

Discussion

Using our comprehensive symptom vocabularies and NimbleMiner to label symptoms in clinical notes produced excellent performance metrics. The ability to extract symptom information from EHR notes in an accurate and scalable manner has the potential to greatly facilitate symptom science research.



Computational linguistics
Computerized medical records
Constipation
Constipation - diagnosis
Depression - diagnosis
Electronic health records
Electronic Health Records - statistics & numerical data
Fatigue - diagnosis
Health records
Language processing
Medical records
Natural language interfaces
Pattern Recognition, Automated - methods
Professional identity
Sleep problems
Sleep Wake Disorders - diagnosis
Symptom Assessment - nursing
Symptoms
Tachycardia - diagnosis
Natural Language Processing