What is MIMIC-III: A Must-Know Dataset For Data Scientists
Classification, NLP, Machine learning, Predictive Modeling, XGBoost, you can try it all using this data
MIMIC-III (Medical Information Mart for Intensive Care) is a large, publicly available database that contains de-identified health data of patients who were admitted to critical care units at the Beth Israel Deaconess Medical Center in Boston, Massachusetts. The dataset is widely used in research on critical care and has been cited in over 2,500 scientific articles (MIT Lab for Computational Physiology, 2021).
TLDR:
The database contains information on over 40,000 patients (16 years and above) who were treated in the intensive care unit (ICU) between 2001 and 2012. Data includes vital signs, medications, laboratory measurements, observations and notes charted by care providers, fluid balance, procedure codes, diagnostic codes, imaging reports, hospital length of stay, survival data, and more. The database supports applications including academic and industrial research, quality improvement initiatives, and higher education coursework.

Who is MIMIC-III for?
MIMIC-III is a valuable resource for researchers, clinicians, and students who are interested in studying critical care medicine and improving patient outcomes. It allows researchers to analyze real-world data and test hypotheses about the care of critically ill patients. The database has been used in over 1,000 research papers and has been cited in over 3,500 scientific publications.
One of the unique features of the MIMIC-III database is that it includes rich clinical notes, which are narrative descriptions of patient care written by healthcare providers. These notes provide valuable insights into the care of critically ill patients and can be used to identify trends and patterns in patient care.

The MIMIC-III database is managed by the Laboratory for Computational Physiology at the Massachusetts Institute of Technology (MIT). It is freely available to researchers, with some limitations on the use of the data to protect patient privacy. Researchers must apply for access to the database and agree to follow the terms of use, which include requirements for de-identifying the data and citing the MIMIC-III database in any publications that use the data.
Patient characteristics
MIMIC-III includes data from 53,423 hospital admissions for adult patients (over 16 years old) who were treated in critical care units between 2001 and 2012, as well as data for 7870 neonatal admissions between 2001 and 2008. The database covers 38,597 individual adult patients and 49,785 hospital admissions in total. The average age of adult patients in the database is 65.8 years old, with 55.9% being male and an in-hospital mortality rate of 11.5%. The median length of stay in the ICU is 2.1 days, with a median hospital stay of 6.9 days. On average, the database includes 4579 charted observations and 380 laboratory measurements per hospital admission.

- CCU: Coronary Care Unit (14.7%)
- CSRU: Cardiac Surgery Recovery Unit (20.9%)
- MICU: Medical Intensive Care Unit (35.4%)
- SICU: Surgical Intensive Care Unit (16.5%)
- TSICU: is Trauma Surgical Intensive Care Unit (12.5%)
Diagnostic codes in MIMIC-III
Given the timeline from 2001–2012, the diagnostic codes are ICD9. These could be grouped into broader categories like Septicemia and infectious and parasitic diseases (ICD-9 001–139); Neoplasms of digestive organs and intrathoracic organs (ICD-9 140–239); Endocrine, nutritional, metabolic, and immunity (ICD-9 240–279); Diseases of the circulatory system (ICD-9 390–459); Pulmonary diseases, i.e., pneumonia and influenza, chronic obstructive pulmonary disease, (ICD-9 460–519); Diseases of the digestive system (ICD-9 520–579); Diseases of the genitourinary system (ICD-9 580–629); Trauma (ICD-9 800–959); and Poisoning by drugs and biological substances (ICD-9 960–979). The distribution of the grouping by the types of intensive care units is shown below.

What Types of Data is Available in MIMIC-III
Billing, demographics, Interventions, Medications, Labs, and vital signals confirmed by the clinical staff are available. The dictionary for cross-referencing concepts with the ICD-9 codes is also available. In the free-text form patient notes, reports from imaging studies, and ECG are also available.

To give a better idea of what information looks like, let's take an example of a single patient. These are the bits of information available in each code category are available. The Vitals in the intensive care unit is measured hourly. Some measures like intake volume and output volumes are cumulative, unlike other continuously monitored measures like Heart rate, O2 saturation, and respiratory rate. Therefore when using cumulative measures it makes sense to use the difference between consecutive measures.

How Is MIMIC-III Structured?
Data is available in the form of 26 tables. Broadly speaking, five tables are used to define and track patient stays: ADMISSIONS; PATIENTS; ICUSTAYS; SERVICES; and TRANSFERS. Another five tables are dictionaries for cross-referencing codes against their respective definitions: D_CPT; D_ICD_DIAGNOSES; D_ICD_PROCEDURES; D_ITEMS; and D_LABITEMS. The remaining tables contain data associated with patient care, such as physiological measurements, caregiver observations, and billing information.

How Was MIMIC-III Deidentified?
Health Insurance Portability and Accountability Act (HIPAA) standards require the patient data to be deidentified in a way to avoid a reasonable risk of re-identification. However, the research necessitates some of the characteristics of the patient to be preserved. The structured data in the MIMIC-III was deidentified by the removal of all eighteen of the identifying data elements listed in the HIPAA safe harbor method (ref), including fields such as patient name, telephone number, address, and dates. Time of day, day of the week, and approximate seasonality were conserved during date shifting. Some patients with age 89 and above were given a proxy age of 300 and above by changing their birth date to avoid the chance of re-identifying.
The project was approved by the Institutional Review Boards of Beth Israel Deaconess Medical Center (Boston, MA) and the Massachusetts Institute of Technology (Cambridge, MA). The requirement for individual patient consent was waived because the project did not impact clinical care and all protected health information was deidentified.
The code for this project is available on GitHub.
How can you get access to this data?
There are two key steps that must be completed before access is granted:
- the researcher must complete a recognized course in protecting human research participants that include Health Insurance Portability and Accountability Act (HIPAA) requirements.
- the researcher must sign a data use agreement, which outlines appropriate data usage and security standards, and forbids efforts to identify individual patients.
Approval requires at least a week. Once an application has been approved the researcher will receive emails containing instructions for downloading the database from PhysioNetWorks, a restricted access component of PhysioNet.
What Is The Future? MIMIC-IV
The MIMIC-III database is a valuable resource for researchers, clinicians, and students who are interested in studying critical care medicine. It contains a wealth of data on patients treated in the ICU and has been used in numerous research studies to improve patient outcomes. The database is managed by the Laboratory for Computational Physiology at MIT. MIMIC-IV has been released and I will be covering it in the next article. It brings a lot of great updates and improvements.
Find the MIMIC-III paper here

To support me 🔔 clap | follow | Subscribe 🔔
Become a member using my link: https://ithinkbot.com/membership
Checkout my other works —






