Leveraging Electronic Healthcare Record Standards and Semantic Web Technologies for the identification of patient cohorts

(Paper submitted to the Special Focus Issue on Electronic Health Records-Driven Phenotyping @ Journal of the American Medical Informatics Association)

Jesualdo Tomás Fernández-Breis1+*, José Alberto Maldonado2+, Mar Marcos3+, María del Carmen Legaz-García1, David Moner2, Joaquín Torres-Sospedra3, Angel Esteban-Gil4, Begoña Martínez-Salvador3, Monserrat Robles2

+ these authors have contributed equally to this work

  1. Departamento de Informática y Sistemas, Universidad de Murcia, 30100, Murcia, Spain
  2. Biomedical Informatics Group, ITACA Institute, Universidad Politécnica de Valencia Camino de Vera s/n, 46022 Valencia, Spain
  3. Dept. of Computer Engineering and Science, Universitat Jaume I,Av. de Vicent Sos Baynat s/n, 12071 Castellón, Spain
  4. Fundación para la Formación e Investigación Sanitaria, C/ Luis Fontes Pagán nº 9 - 1ª planta, 30003 Murcia, Spain



The secondary use of Electronic Healthcare Records (EHRs) often requires the identification of patient cohorts. In this context, an important problem is the heterogeneity of clinical data sources, which can be overcome with the combined use of standardized information models, Virtual Health Records, and semantic technologies, since each of them contributes to solving aspects related to the semantic interoperability of EHR data. Our main objective is to develop methods allowing for a direct use of EHR data for the identification of patient cohorts leveraging current EHR standards and semantic web technologies.

Materials and Methods

We propose to take advantage of the best features of working with EHR standards and ontologies. Our proposal is based on our previous results and experience working with both technological infrastructures. Our main principle is to perform each activity at the abstraction level with the most appropriate technology available. This means that part of the processing will be performed using archetypes (i.e., data level) and the rest using ontologies (i.e., knowledge level). Our approach will start working with EHR data in proprietary format, which will be first normalized and elaborated using EHR standards and then transformed into a semantic representation, which will be exploited by automated reasoning.


We have applied our approach to protocols for colorectal cancer screening. The results comprise the archetypes, ontologies and datasets developed for the standardization and semantic analysis of EHR data. Anonymized real data has been used and the patients have been successfully classified by the risk of developing colorectal cancer.


This work provides new insights in how archetypes and ontologies can be effectively combined for EHR-driven phenotyping. The methodological approach can be applied to other problems provided that suitable archetypes, ontologies and classification rules can be designed.