Sepsis - Using Clinical Healthcare Data Science to Identify and Combat an Infectious Killer

Data Science Capstone Domain - DSC 180AB

Fall & Winter 2024 - Prof. Kyle Shannon

Thursday @ 2:30 PM HDSI 226 section: A02


Introduction to Topic

Students will explore the world of inpatient ICU care by examining severe infection management and detection using the MIMIC dataset, a comprehensive, publicly available database of de-identified ICU patient data. This project will familiarize participants with healthcare data nuances and the critical role EHRs play in clinical decision-making. Through this experience, students will gain insights into the broader context of clinical decision-making and public health, learning to leverage EHRs and clinical data science for developing potential products, reports, and or health policies. They will better understand the US healthcare system, ICU operations, and the decision-making process for complex infectious cases like sepsis. By studying the work of multidisciplinary teams, students will gain a deeper understanding of intricate ICU cases and the patients' journeys through this challenging healthcare landscape. Additionally, they will appreciate the complexities of conducting data science in a demanding environment.


Phase I - Replication

The aim of reproducing a paper's results is to affirm the original authors' findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence for the identified critical care patient subgroups and their risks, potentially impacting patient care. It also provides a deeper understanding of the applied methods like latent class analysis and k-means clustering. Ultimately, this endeavor seeks to enrich critical care knowledge and potentially influence future research and clinical practice.

The paper, linked here, we will be working with is:

Zador, Z., Landry, A., Cusimano, M.D. et al. Multimorbidity states associated with higher mortality rates in organ dysfunction and sepsis: a data-driven analysis in critical care. Crit Care 23, 247 (2019). https://doi.org/10.1186/s13054-019-2486-6

Accessing the MIMIC Dataset

Gaining access to MIMIC data via PhysioNet can be a somewhat complex process. It's advisable to initiate this process a few weeks before your classes start, as it can take up to a month to complete verification. The process includes account creation, submitting online forms, and a two-hour online training course. Please adhere to the following steps meticulously for a smooth experience.

  1. Create an account on PhysioNet using your UCSD email.
  2. Follow these instructions to complete your CITI training course. You will need to create an account with CITI Program.
  3. Search for the "CITI Data or Specimens Only Research" course as per the instructions from step 2. This course is affiliated with MIT and requires answering some questions.
  4. After achieving a pass rate of 90% or more in the training (you can retake section quizzes if needed), upload the "report" (not the "certificate") to PhysioNet on your account under the training tab.
  5. Apply for credentialing on PhysioNet. Fill out the form, add Kyle Shannon (kshannon@ucsd.edu, Department: HDSI, role: Supervisor) to your application, and submit it.
  6. Wait for approval. This can take between 1 day to 3 weeks. You will receive two emails with the following subjects:
    • Your application for PhysioNet credentialing
    • Your application for PhysioNet training
    • If both emails confirm your authorization, proceed to the next step. If not, contact Kyle Shannon immediately.
  7. Return to PhysioNet and visit the specific dataset pages linked below. Sign the "Data Use Agreement" for the MIMIC-III and MIMIC-IV datasets:
  8. Both datasets will be available for download at the bottom of their respective pages.

Schedule

Click the "topic" links below for details regarding the readings, questions, and tasks for that week.

Domain Expertise & Supplementary Resources

Check out the domain expertise page I have set up. This page will provide guideance, resources, and help you to better understand all of the non data science elements to this project, e.g. EHR, Illness scores, etc. This is only a starting point, and you may need to do further research, reading and interviewing with domain experts to gain the knowledge and answers you need. I will still assist you in these poursuits, but it is important as a data scientist that you learn how to track down information and do research.

Also take a look at my resources drive folder which has relevant papers and books which will be useful when working on this project.


Phase II

We will explore the creation of some real world reporting aspect for sepsis, along with an advanced analysis, machine learning model, or data pipeline/warehousing feature.


An Initial Note on Ethics

The MIMIC dataset, a critical care database, embodies the intersection of medical research and ethical considerations. It provides a wealth of de-identified patient data, enabling researchers to study critical care conditions while respecting patient privacy. However, the use of this dataset still requires careful ethical considerations.

The principle of autonomy is upheld in the MIMIC dataset as it consists of de-identified patient data, ensuring patient consent and privacy. However, researchers must remain vigilant to avoid any potential re-identification of individuals, which could infringe on patient privacy. The principle of beneficence is reflected in the use of the MIMIC dataset for research aimed at improving critical care outcomes. However, researchers must ensure that their studies are designed to maximize potential benefits, such as enhancing understanding of critical care conditions or developing new treatment strategies.

Reproducibility is a key aspect of research using the MIMIC dataset. It promotes transparency and trust in the findings derived from the dataset. However, reproducing research must be balanced with maintaining data security. Researchers must ensure that they use secure methods to access and analyze the MIMIC dataset, protecting the data from unauthorized access or misuse. Remember, while the MIMIC dataset provides a valuable resource for critical care research, it also underscores the importance of ethical considerations in medical research. Researchers must navigate these ethical considerations carefully to ensure they respect patient rights and interests while advancing critical care research.


Section & Group Participation

Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.

Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).


Advice from Previous Students

  • Hey all! Congrats on making it to DSC 180AB. This capstone will have a steep learning curve but it will be very rewarding to both you and potential stakeholders who may be watching your project. I have 3 main pieces of advice:
    1. Ask questions! - This project has a very steep learning curve with lots of terminology that most will probably not understanding walking into it. Don’t let that discourage you as everyone goes through that learning curve and instead use it as an opportunity to learn how to adapt and onboard new projects. Act like you’ve just joined a new company and Kyle is your manager. You should be asking many questions so you don’t walk into meeting completely blind. Again, it’s one of the most important skills in any field is to learn how to learn, and part of that involves asking questions!
    2. Focus on impact or why you are doing this. - In most data science classes we typically just assign you a project and you do it to complete it and earn a grade. This capstone however is not the case. You’re working with real data that other professional researchers are currently working with. So when your choosing a project path during WI quarter, think of a project that will have impact to the community (sometimes that means focus on saving money/cutting cost, saving lives, or making processes easier/faster).
    3. Think creatively and have fun! - Remember this is your senior project and you only have 2 quarters. While it may seem like a long time, it will fly by. There are many researchers still researching sepsis (including at ucsd) so get creative with your project. What are things people have already done? What are things that would better other people’s research/projects? I personally know two different research groups at ucsd focusing on sepsis detection and their addition to current research is looking at continuous vital signs rather than hourly, creating llms to read doctors notes, using imaging data, and more. So use Kyle’s connections to learn about sepsis but focus on what their processes lack or could use more of. Remember most of the people you will talk to will be doctors and nurses so how can you bridge that gap between their skills and experience as doctors with your skills as a data scientist. Most of all have fun and learn! Kyle is a great mentor so do take his advice into considerations! Goodluck and I’m sure you all will enjoy it and be completely fine :)

    - Diego Zavalza

  • This project will likely be the most rewarding one you’ve worked on during your time at HDSI. However, it’s important to remember that what you get out of it is directly related to the effort you put in. Kyle will help you stay on track, but with a project this significant happening over the course of a quarter, it might feel like it’s over before you even start. Therefore, keeping your goals organized and being proactive is essential. Some advice to keep in mind:
    1. When approaching this project, working with the MIMIC III dataset may feel intimidating, but that’s a testament to the dataset's potential. I recommend keeping a detailed record of your knowledge as you learn about the dataset. For instance, when you read about the admissions table, note down its general use, its connections to other tables, its inputs, and possible applications. This may seem simple and unnecessary for one table, but it will help you retain information much better when you start handling multiple tables simultaneously. Building a thorough foundation of your knowledge on MIMIC III is crucial for progressing in the project, as you will already have a foundational understanding of other parts from your classes. This is something Kyle definitely told us to do but the significance of it should be reiterated.
    2. Additionally, consider what you personally want to gain from this experience before you start. This is important because projects like this have many moving parts, and you may get flustered or lose stamina if you don’t keep your personal goals in mind. Understanding your personal goals will also help you develop an original idea for the second half of your capstone journey, making the experience much more rewarding because you have a personal stake in it. It’s also a great way to gauge which objectives you’ve met, what unexpected lessons you’ve learned, and what you still want to achieve in your next experience.
    3. Overall, my final piece of advice is to view moments of feeling overwhelmed as a signal to keep pushing and striving for more. This unique experience is one you’ll never get again, and it will undoubtedly bring you valuable opportunities. Whether or not it helps you gain work experience, it will shape you for future success more than any other experience at HDSI.

    - Vibha Sastry

  • Hello Everyone, here is some advice for the next 2 quarters!
    1. Stay connected with your teammates and supervisor as much as you can.
    2. Clear and efficient communication is always important.
    3. Advocate for yourself while also being open-minded and respectful to your fellow students.
    4. We went through the hardships of DSC 20, 30, and 80. We can succeed in this class as we did at UCSD. Be proud of what you are doing!
    5. The more time and effort you put in, the more you will take away from the course.
    6. Do not hesitate to ask for help! Everyone is there to help you!
    7. Good luck, everyone!

    - Jiyeon Song


Office Hours

Impromptu - Send me a message on Slack!