Secondary Analysis of Electronic Health Records

This book trains the next generation of scientists representing different disciplines to leverage the data generated during routine patient care. It formulates a more complete lexicon of evidence-based recommendations and support shared, ethical decision making by doctors with their patients.

Diagnostic and therapeutic technologies continue to evolve rapidly, and both individual practitioners and clinical teams face increasingly complex ethical decisions. Unfortunately, the current state of medical knowledge does not provide the guidance to make the majority of clinical decisions on the basis of evidence.

The present research infrastructure is inefficient and frequently produces unreliable results that cannot be replicated. Even randomized controlled trials (RCTs), the traditional gold standards of the research reliability hierarchy, are not without limitations. They can be costly, labor intensive, and slow, and can return results that are seldom generalizableto every patient population. Furthermore, many pertinent but unresolved clinical and medical systems issues do not seem to have attracted the interest of the research enterprise, which has come to focus instead on cellular and molecular investigations and single-agent (e.g., a drug or device) effects. For clinicians, the end result is a bit of a “data desert” when it comes to making decisions. The new research infrastructure proposed in this book will help the medical profession to make ethically sound and well informed decisions for their patients.

Similar content being viewed by others

Biases in Electronic Health Records Data for Generating Real-World Evidence: An Overview

Article 14 November 2023

Design, implementation, and inferential issues associated with clinical trials that rely on data in electronic medical records: a narrative review

Article Open access 16 November 2023

Real world data and data science in medical research: present and future

Article Open access 13 April 2022

Keywords

Table of contents (30 chapters)

Front Matter

Pages i-xxi

Setting the Stage: Rationale Behind and Challenges to Health Data Analysis

Front Matter

Objectives of the Secondary Analysis of Electronic Health Record Data

Pages 3-7 Open Access

Review of Clinical Databases

Pages 9-16 Open Access

Challenges and Opportunities in Secondary Analyses of Electronic Health Record Data

Pages 17-26 Open Access

Pulling It All Together: Envisioning a Data-Driven, Ideal Care System

Pages 27-42 Open Access

The Story of MIMIC

Pages 43-49 Open Access

Integrating Non-clinical Data with EHRs

Pages 51-60 Open Access

Using EHR to Conduct Outcome and Health Services Research

Pages 61-70 Open Access

Residual Confounding Lurking in Big Data: A Source of Error

Pages 71-78 Open Access

A Cookbook: From Research Question Formulation to Validation of Findings

Front Matter

Pages 79-80

Formulating the Research Question

Pages 81-92 Open Access

Defining the Patient Cohort

Pages 93-100 Open Access

Data Preparation

Pages 101-114 Open Access

Data Pre-processing

Pages 115-141 Open Access

Missing Data

Pages 143-162 Open Access

Noise Versus Outliers

Pages 163-183 Open Access

Exploratory Data Analysis

Pages 185-203 Open Access

Data Analysis

Pages 205-261 Open Access

Sensitivity Analysis and Model Validation

Pages 263-271 Open Access

Authors and Affiliations

Massachusetts Institute of Technology, Cambridge, USA

About the author

MIT Critical Data

MIT Critical Data consists of data scientists and clinicians from around the globe brought together by a vision to engender a data-driven healthcare system supported by clinical informatics without walls. In this ecosystem, the creation of evidence and clinical decision support tools is initiated, updated, honed and enhanced by scaling the access to and meaningful use of clinical data.

Leo Anthony Celi

Leo has practiced medicine in three continents, giving him broad perspectives in healthcare delivery. His research is on secondary analysis of electronic health records and global health informatics. He founded and co-directs Sana at the Institute for Medical Engineering and Science at the Massachusetts Institute of Technology. He also holds a faculty position at Harvard Medical School as an intensivist at the Beth Israel Deaconess Medical Center and is the clinical research director for the Laboratory of Computational Physiology at MIT.Finally, he is one of the course directors for HST.936 at MIT – innovations in global health informatics and HST.953 – secondary analysis of electronic health records.

Peter gained the degree of MEng in Engineering Science in 2010 from the University of Oxford. Since then he held a research position, working jointly with Guy's and St Thomas' NHS Foundation Trust, and King's College London. Peter’s research focuses on physiological monitoring of hospital patients, divided into three areas. The first area concerns the development of signal processing techniques to estimate clinical parameters from physiological signals. He has focused on unobtrusive estimation of respiratory rate for use in ambulatory settings, invasive estimation of cardiac output for use in critical care, and novel techniques for analysis of the pulse oximetry (photoplethysmogram) signal. Secondly, he is investigating the effectiveness of technologies for the acquisition of continuous and intermittent physiological measurements in ambulatory and intensive care settings. Thirdly, he is developing techniques to transform continuous monitoring data into measurements that are appropriate for real-time alerting of patient deteriorations.

Mohammad is a doctoral candidate at the Massachusetts Institute of Technology. As an undergraduate, he studied Electrical Engineering and graduated as both a Goldwater scholar and the University's “Outstanding Engineer”. In 2011, Mohammad received an MPhil in Information Engineering from the University of Cambridge where he was also a recipient of the Gates-Cambridge Scholarship. Since arriving at MIT, he has perused research at the interface of machine learning and medical informatics. Mohammad's doctoral focus is on signal processing and machine learning techniques in the context of multi-modal, multi-scale datasets. He has helped put together the largest collection of post-anoxic coma EEGs inthe world. In addition to his thesis work, Mohammad has worked with the Samsung corporation, and several entities across campus building “smart devices” including: a multi-sensor wearable that passively monitors the physiological, audio and video activity of a user to estimate a latent emotional state.

Alistair joined the Laboratory for Computational Physiology as a postdoctoral associate in 2015. He received his B.Eng in Biomedical and Electrical Engineering at McMaster University, Canada, and subsequently read for a D.Phil in Healthcare Innovation at the University of Oxford. His thesis was titled “Mortality and acuity assessment in critical care”, and its focus included using machine learning techniques to predict mortality and develop new severity of illness scores for patients admitted to intensive care units. Before joining the LCP, Alistair spent a year as a research assistant at the John Radcliffe hospital in Oxford, where he worked on building early alerting models for patients post-ICU discharge. Alistair’s research interests revolve around the use of data collected during routine clinical practice to improve patient care.

Matthieu holds board certification in anesthesiology and critical care in both France and the UK. A former medical research fellow at the European Space Agency, he completed a Master of Research in Biomedical Engineering at Imperial College London focusing on machine learning. Dr Komorowski now pursues a PhD at Imperial College and a research fellowship in intensive care at Charing Cross Hospital in London. In his research, he combines his expertise in machine learning and critical care to generate new clinical evidence and build the next generation of clinical tools such as decision support systems, with a particular interest in septic shock, the number one killer in intensive care and the single most expensive condition treated in hospitals.

Dominic is an Academic Foundation doctor in Oxford, United Kingdom. Dominic read Molecular and Cellular biology at the University of Bath and worked at Eli Lilly in their Alzheimer’s disease drug hunting research program. He pursued his medical training at Imperial College London where he was awarded the Santander Undergraduate scholarship for academic performance and ranked first overall in his graduating class. His research interests range from molecular biology to analysis of large clinical data sets and he has received non-industry grant funding to pursue the development of novel antibiotics and chemotherapeutic agents. Alongside clinical training, he is involved in a number of research projects focusing on analysis of electronic health care records.

Tristan Naumann is a PhD candidate in Electrical Engineering and Computer Science at MIT working with Dr. Peter Szolovits in CSAIL’s Clinical Decision Making group. His research includes exploring relationships in complex, unstructured data using data-informed unsupervised learning techniques, and the application of natural language processing techniques in healthcare data. He has been an organizer for workshops and “datathon” events, which bring together participants with diverse backgrounds in order to address biomedical and clinical questions in a manner that is reliable and reproducible.

Kenneth is a clinical informatician driving quality improvement and democratizing access through technology innovation, combining a multidisciplinary background in medicine, artificial intelligence, business management, and technology strategy. He is a research scientist at the MIT Laboratory for Computational Physiology investigating the secondary analysis of health data and building intelligent decision support system. As the co-director of Sana, he leads programs and project driving qualityimprovement and building capacity in global health. He received his MD and MBA degrees from Georgetown University and completed fellowship training in biomedical informatics at Harvard Medical School and the Massachusetts General Hospital Laboratory for Computer Science.

Tom Joseph Pollard

Tom is a Postdoctoral Associate at the MIT Laboratory for Computational Physiology. Most recently he has been working with colleagues to release MIMIC-III, an openly-accessible critical care database. Prior to joining MIT in 2015, Tom completed his PhD at University College London, UK, where he explored models of health in critical care patients in an interdisciplinary project between the Mullard Space Science Laboratory and University College Hospital. Tom has a broad interest in how we can improve the way that critical care data is managed, shared, and analyzed for the benefit of patients. He is a Fellow of the Software Sustainability Institute.

Jesse is a research scientist in the Lab for Computational Physiology at the Massachusetts Institute of Technology in Cambridge, USA. He received his PhD in biostatistics from the University of Waterloo (Canada) in 2013. His primary methodological interests are related to the modeling of complex longitudinal data, latent variable models and reproducible research. In addition to his methodological contributions, he has collaborated and published over 20 academic articles with colleagues in a diverse set of areas including: infectious diseases, addiction and critical care, among others. Jesse was the recipient of the distinguished student paper award at the Eastern North American Region International Biometric Society conference in 2013, and the new investigator of the year for the Canadian Association of HIV/AIDS Research in 2004.

Justin is an Academic Foundation doctor in London, United Kingdom. Originally from Toronto, Canada, Justin completed his undergraduate and graduate studies in the United States before pursuing his medical studies at Imperial College London. His research pursuits started as an undergraduate student while completing a biochemistry degree. Subsequently, he worked on clinical trials in emergency medicine and intensive care medicine at Beth Israel Deaconess Medical Center in Boston and completed a Masters degree with his thesis on Vitamin D deficiency in critically ill patients with sepsis. During this time he developed a keen interest in statistical methods and programming particularly in SAS and R. He has co-authored more than 30 peer-reviewed manuscripts and, in addition to his current clinical training, continues with his research interests on analytical methods for observational and clinical trial data as well as education in analytics for medical students and clinicians.

Bibliographic Information