avatarDr. Mandar Karhade, MD. PhD.

Summary

MIMIC-III is a comprehensive, publicly accessible database containing de-identified health data from over 40,000 critical care patients, used extensively in research to improve critical care outcomes and medical knowledge.

Abstract

MIMIC-III (Medical Information Mart for Intensive Care) is a vast database comprising detailed medical information from patients admitted to ICUs at the Beth Israel Deaconess Medical Center. It includes a wide range of data such as vital signs, medications, laboratory test results, clinical notes, and more, spanning from 2001 to 2012. This rich dataset is utilized by researchers, clinicians, and students for academic research, quality improvement, and educational purposes. The database is particularly valuable for its rich clinical notes, which offer insights into patient care and trends. Access to MIMIC-III is controlled to protect patient privacy, requiring researchers to complete training and sign data use agreements. The data has been de-identified in accordance with HIPAA standards to prevent re-identification while preserving essential patient characteristics. MIMIC-III has been cited in thousands of scientific publications, underscoring its significance in the medical research community.

Opinions

  • The MIMIC-III database is considered a "must-know" dataset for data scientists due to its extensive applications in machine learning, predictive modeling, and natural language processing.
  • The database's inclusion of narrative clinical notes is highlighted as a unique feature that provides valuable context for the care of critically ill patients.
  • The management and access protocols for MIMIC-III, including the requirement for researchers to complete training and sign data use agreements, are seen as appropriate measures to ensure ethical use of the data while maintaining patient confidentiality.

What is MIMIC-III: A Must-Know Dataset For Data Scientists

Classification, NLP, Machine learning, Predictive Modeling, XGBoost, you can try it all using this data

MIMIC-III (Medical Information Mart for Intensive Care) is a large, publicly available database that contains de-identified health data of patients who were admitted to critical care units at the Beth Israel Deaconess Medical Center in Boston, Massachusetts. The dataset is widely used in research on critical care and has been cited in over 2,500 scientific articles (MIT Lab for Computational Physiology, 2021).

TLDR:

The database contains information on over 40,000 patients (16 years and above) who were treated in the intensive care unit (ICU) between 2001 and 2012. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

Source: MIMIC-III Research Paper

Who is MIMIC-III for?

MIMIC-III is a valuable resource for researchers, clinicians, and students who are interested in studying critical care medicine and improving patient outcomes. It allows researchers to analyze real-world data and test hypotheses about the care of critically ill patients. The database has been used in over 1,000 research papers and has been cited in over 3,500 scientific publications.

One of the unique features of the MIMIC-III database is that it includes rich clinical notes, which are narrative descriptions of patient care written by healthcare providers. These notes provide valuable insights into the care of critically ill patients and can be used to identify trends and patterns in patient care.

Source: MIMIC-III Research Paper

The MIMIC-III database is managed by the Laboratory for Computational Physiology at the Massachusetts Institute of Technology (MIT). It is freely available to researchers, with some limitations on the use of the data to protect patient privacy. Researchers must apply for access to the database and agree to follow the terms of use, which include requirements for de-identifying the data and citing the MIMIC-III database in any publications that use the data.

Patient characteristics

MIMIC-III includes data from 53,423 hospital admissions for adult patients (over 16 years old) who were treated in critical care units between 2001 and 2012, as well as data for 7870 neonatal admissions between 2001 and 2008. The database covers 38,597 individual adult patients and 49,785 hospital admissions in total. The average age of adult patients in the database is 65.8 years old, with 55.9% being male and an in-hospital mortality rate of 11.5%. The median length of stay in the ICU is 2.1 days, with a median hospital stay of 6.9 days. On average, the database includes 4579 charted observations and 380 laboratory measurements per hospital admission.

Source: MIMIC-III Research Paper
  • CCU: Coronary Care Unit (14.7%)
  • CSRU: Cardiac Surgery Recovery Unit (20.9%)
  • MICU: Medical Intensive Care Unit (35.4%)
  • SICU: Surgical Intensive Care Unit (16.5%)
  • TSICU: is Trauma Surgical Intensive Care Unit (12.5%)

Diagnostic codes in MIMIC-III

Given the timeline from 2001–2012, the diagnostic codes are ICD9. These could be grouped into broader categories like Septicemia and infectious and parasitic diseases (ICD-9 001–139); Neoplasms of digestive organs and intrathoracic organs (ICD-9 140–239); Endocrine, nutritional, metabolic, and immunity (ICD-9 240–279); Diseases of the circulatory system (ICD-9 390–459); Pulmonary diseases, i.e., pneumonia and influenza, chronic obstructive pulmonary disease, (ICD-9 460–519); Diseases of the digestive system (ICD-9 520–579); Diseases of the genitourinary system (ICD-9 580–629); Trauma (ICD-9 800–959); and Poisoning by drugs and biological substances (ICD-9 960–979). The distribution of the grouping by the types of intensive care units is shown below.

Source: MIMIC-III Research Paper

What Types of Data is Available in MIMIC-III

Billing, demographics, Interventions, Medications, Labs, and vital signals confirmed by the clinical staff are available. The dictionary for cross-referencing concepts with the ICD-9 codes is also available. In the free-text form patient notes, reports from imaging studies, and ECG are also available.

Source: MIMIC-III Research Paper

To give a better idea of what information looks like, let's take an example of a single patient. These are the bits of information available in each code category are available. The Vitals in the intensive care unit is measured hourly. Some measures like intake volume and output volumes are cumulative, unlike other continuously monitored measures like Heart rate, O2 saturation, and respiratory rate. Therefore when using cumulative measures it makes sense to use the difference between consecutive measures.

Source: MIMIC-III Research Paper

How Is MIMIC-III Structured?

Data is available in the form of 26 tables. Broadly speaking, five tables are used to define and track patient stays: ADMISSIONS; PATIENTS; ICUSTAYS; SERVICES; and TRANSFERS. Another five tables are dictionaries for cross-referencing codes against their respective definitions: D_CPT; D_ICD_DIAGNOSES; D_ICD_PROCEDURES; D_ITEMS; and D_LABITEMS. The remaining tables contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.

Source: MIMIC-III Research Paper

How Was MIMIC-III Deidentified?

Health Insurance Portability and Accountability Act (HIPAA) standards require the patient data to be deidentified in a way to avoid a reasonable risk of re-identification. However, the research necessitates some of the characteristics of the patient to be preserved. The structured data in the MIMIC-III was deidentified by the removal of all eighteen of the identifying data elements listed in the HIPAA safe harbor method (ref), including fields such as patient name, telephone number, address, and dates. Time of day, day of the week, and approximate seasonality were conserved during date shifting. Some patients with age 89 and above were given a proxy age of 300 and above by changing their birth date to avoid the chance of re-identifying.

The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). The requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.

The code for this project is available on GitHub.

How can you get access to this data?

There are two key steps that must be completed before access is granted:

  • the researcher must complete a recognized course in protecting human research participants that include Health Insurance Portability and Accountability Act (HIPAA) requirements.
  • the researcher must sign a data use agreement, which outlines appropriate data usage and security standards, and forbids efforts to identify individual patients.

Approval requires at least a week. Once an application has been approved the researcher will receive emails containing instructions for downloading the database from PhysioNetWorks, a restricted access component of PhysioNet.

What Is The Future? MIMIC-IV

The MIMIC-III database is a valuable resource for researchers, clinicians, and students who are interested in studying critical care medicine. It contains a wealth of data on patients treated in the ICU and has been used in numerous research studies to improve patient outcomes. The database is managed by the Laboratory for Computational Physiology at MIT. MIMIC-IV has been released and I will be covering it in the next article. It brings a lot of great updates and improvements.

Find the MIMIC-III paper here

To support me 🔔 clap | follow | Subscribe 🔔

Become a member using my link: https://ithinkbot.com/membership

Checkout my other works —

Photo by National Cancer Institute on Unsplash
Machine Learning
Artificial Intelligence
Data Science
Health
NLP
Recommended from ReadMedium