Leveraging Electronic Healthcare Record
Standards and Semantic Web Technologies for the identification of patient
cohorts
(Paper submitted to the Special Focus Issue
on Electronic Health Records-Driven Phenotyping @ Journal
of the American Medical Informatics Association)
Jesualdo Tomás Fernández-Breis1+*, José Alberto Maldonado2+,
Mar Marcos3+, María del Carmen Legaz-García1, David
Moner2, Joaquín Torres-Sospedra3, Angel Esteban-Gil4,
Begoña Martínez-Salvador3, Monserrat Robles2
+ these authors have contributed equally to this work
- Departamento de Informática y Sistemas, Universidad de Murcia, 30100,
Murcia, Spain
- Biomedical Informatics Group, ITACA Institute, Universidad
Politécnica de Valencia Camino de Vera s/n, 46022 Valencia, Spain
- Dept. of Computer Engineering and Science, Universitat Jaume I,Av. de
Vicent Sos Baynat s/n, 12071 Castellón, Spain
- Fundación para la Formación e Investigación Sanitaria, C/ Luis Fontes
Pagán nº 9 - 1ª planta, 30003 Murcia, Spain
Abstract
Introduction
The secondary use of Electronic Healthcare Records (EHRs) often requires the
identification of patient cohorts. In this context, an important problem is
the heterogeneity of clinical data sources, which can be overcome with the
combined use of standardized information models, Virtual Health Records, and
semantic technologies, since each of them contributes to solving aspects
related to the semantic interoperability of EHR data. Our main objective is
to develop methods allowing for a direct use of EHR data for the
identification of patient cohorts leveraging current EHR standards and
semantic web technologies.
Materials and Methods
We propose to take advantage of the best features of working with EHR
standards and ontologies. Our proposal is based on our previous results and
experience working with both technological infrastructures. Our main
principle is to perform each activity at the abstraction level with the most
appropriate technology available. This means that part of the processing
will be performed using archetypes (i.e., data level) and the rest using
ontologies (i.e., knowledge level). Our approach will start working with EHR
data in proprietary format, which will be first normalized and elaborated
using EHR standards and then transformed into a semantic representation,
which will be exploited by automated reasoning.
Results
We have applied our approach to protocols for colorectal cancer screening.
The results comprise the archetypes, ontologies and datasets developed for
the standardization and semantic analysis of EHR data. Anonymized real data
has been used and the patients have been successfully classified by the risk
of developing colorectal cancer.
Conclusion
This work provides new insights in how archetypes and ontologies can be
effectively combined for EHR-driven phenotyping. The methodological approach
can be applied to other problems provided that suitable archetypes,
ontologies and classification rules can be designed.
Archetypes
Ontologies
Datasets