Dive into the MIMIC-III dataset + EDA

← back to home

These next two weeks will be focused on setting up the MIMIC-III dataset to run locally, and exploring the data. Get comfortable with the different tables and perform some light EDA.


You will be touching on the following topics.

  • Learning about EHR data
  • How EHR data is processed into a DB


  • Each team member should focus on performing their own EDA
  • Complete all assigned readings
  • Continue to familairize yourself with the domain by thumbing through the Domain Expertise page critical
  • Submit question answers to the online form (linked below)
  • Try to calculate some illness scores (e.g. SOFA) for some patients, see how this process might work

Readings & Videos

  • Again, read the capstone primary paper by Zador et al. linked on the home page. Take good notes, and begin to explore some of the key papers in the references.
  • Watch this video presentation on the MIMIC-III database link
  • Watch this lecture on deep diving into clinical data with MIMIC. link You can also try to reproduce some of the EDA and analysis in this video by using the MIMIC dataset


Answer the following questions using this google form link

  1. From the MIT OpenCourseWare lecture, what key insights and knowledge did you gain regarding clinical data?
  2. Describe any challenges you encountered while setting up the MIMIC-III dataset on your local machine. How did you address and resolve these challenges?
  3. Please provide an overview of the steps and methodologies involved in transforming raw EHR data into the structured format of the MIMIC dataset. Note that you may need to conduct additional research through online sources for comprehensive information.
  4. Were you successful in calculating illness scores for any of the patients in the dataset? Elaborate on the process you followed and discuss any challenges faced along the way.
  5. How does your understanding of the domain influence your approach to data exploration?