Sepsis - Using Clinical Healthcare Data Science to Identify and Combat an Infectious Killer

Data Science Capstone Domain - DSC 180AB

Fall & Winter 2023 - Prof. Kyle Shannon

Thursday @ 1:30 PM Loc: TBD section: A05

Introduction to Topic

Students will explore the world of inpatient ICU care by examining severe infection management and detection using the MIMIC dataset, a comprehensive, publicly available database of de-identified ICU patient data. This project will familiarize participants with healthcare data nuances and the critical role EHRs play in clinical decision-making. Through this experience, students will gain insights into the broader context of clinical decision-making and public health, learning to leverage EHRs and clinical data science for developing potential products, reports, and or health policies. They will better understand the US healthcare system, ICU operations, and the decision-making process for complex infectious cases like sepsis. By studying the work of multidisciplinary teams, students will gain a deeper understanding of intricate ICU cases and the patients' journeys through this challenging healthcare landscape. Additionally, they will appreciate the complexities of conducting data science in a demanding environment.

Phase I - Replication

The aim of reproducing a paper's results is to affirm the original authors' findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence for the identified critical care patient subgroups and their risks, potentially impacting patient care. It also provides a deeper understanding of the applied methods like latent class analysis and k-means clustering. Ultimately, this endeavor seeks to enrich critical care knowledge and potentially influence future research and clinical practice.

The paper, linked here, we will be working with is:

Zador, Z., Landry, A., Cusimano, M.D. et al. Multimorbidity states associated with higher mortality rates in organ dysfunction and sepsis: a data-driven analysis in critical care. Crit Care 23, 247 (2019).

Accessing the MIMIC Dataset

Gaining access to MIMIC data via PhysioNet can be a somewhat complex process. It's advisable to initiate this process a few weeks before your classes start, as it can take up to a month to complete verification. The process includes account creation, submitting online forms, and a two-hour online training course. Please adhere to the following steps meticulously for a smooth experience.

  1. Create an account on PhysioNet using your UCSD email.
  2. Follow these instructions to complete your CITI training course. You will need to create an account with CITI Program.
  3. Search for the "CITI Data or Specimens Only Research" course as per the instructions from step 2. This course is affiliated with MIT and requires answering some questions.
  4. After achieving a pass rate of 90% or more in the training (you can retake section quizzes if needed), upload the "report" (not the "certificate") to PhysioNet on your account under the training tab.
  5. Apply for credentialing on PhysioNet. Fill out the form, add Kyle Shannon (, Department: HDSI, role: Supervisor) to your application, and submit it.
  6. Wait for approval. This can take between 1 day to 3 weeks. You will receive two emails with the following subjects:
    • Your application for PhysioNet credentialing
    • Your application for PhysioNet training
    • If both emails confirm your authorization, proceed to the next step. If not, contact Kyle Shannon immediately.
  7. Return to PhysioNet and visit the specific dataset pages linked below. Sign the "Data Use Agreement" for the MIMIC-III and MIMIC-IV datasets:
  8. Both datasets will be available for download at the bottom of their respective pages.


Click the "topic" links below for details regarding the readings, questions, and tasks for that week.

Domain Expertise & Supplementary Resources

Check out the domain expertise page I have set up. This page will provide guideance, resources, and help you to better understand all of the non data science elements to this project, e.g. EHR, Illness scores, etc. This is only a starting point, and you may need to do further research, reading and interviewing with domain experts to gain the knowledge and answers you need. I will still assist you in these poursuits, but it is important as a data scientist that you learn how to track down information and do research.

Also take a look at my resources drive folder which has relevant papers and books which will be useful when working on this project.

Phase II


An Initial Note on Ethics

The MIMIC dataset, a critical care database, embodies the intersection of medical research and ethical considerations. It provides a wealth of de-identified patient data, enabling researchers to study critical care conditions while respecting patient privacy. However, the use of this dataset still requires careful ethical considerations.

The principle of autonomy is upheld in the MIMIC dataset as it consists of de-identified patient data, ensuring patient consent and privacy. However, researchers must remain vigilant to avoid any potential re-identification of individuals, which could infringe on patient privacy. The principle of beneficence is reflected in the use of the MIMIC dataset for research aimed at improving critical care outcomes. However, researchers must ensure that their studies are designed to maximize potential benefits, such as enhancing understanding of critical care conditions or developing new treatment strategies.

Reproducibility is a key aspect of research using the MIMIC dataset. It promotes transparency and trust in the findings derived from the dataset. However, reproducing research must be balanced with maintaining data security. Researchers must ensure that they use secure methods to access and analyze the MIMIC dataset, protecting the data from unauthorized access or misuse. Remember, while the MIMIC dataset provides a valuable resource for critical care research, it also underscores the importance of ethical considerations in medical research. Researchers must navigate these ethical considerations carefully to ensure they respect patient rights and interests while advancing critical care research.

Section & Group Participation

Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.

Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).

Office Hours

Tueesday @ 2:30 PM

CSB 255 (Cognitive Science building, top floor facing Giesel Library)