Data Science Capstone Domain - DSC 180AB
Fall & Winter 2024 - Prof. Kyle Shannon
Students will explore the world of inpatient ICU care by examining severe infection management and detection using the MIMIC dataset, a comprehensive, publicly available database of de-identified ICU patient data. This project will familiarize participants with healthcare data nuances and the critical role EHRs play in clinical decision-making. Through this experience, students will gain insights into the broader context of clinical decision-making and public health, learning to leverage EHRs and clinical data science for developing potential products, reports, and or health policies. They will better understand the US healthcare system, ICU operations, and the decision-making process for complex infectious cases like sepsis. By studying the work of multidisciplinary teams, students will gain a deeper understanding of intricate ICU cases and the patients' journeys through this challenging healthcare landscape. Additionally, they will appreciate the complexities of conducting data science in a demanding environment.
The aim of reproducing a paper's results is to affirm the original authors' findings and methodologies. This process is vital in science to ensure results are robust and reliable, not merely due to chance or error. Reproduction would reinforce the evidence for the identified critical care patient subgroups and their risks, potentially impacting patient care. It also provides a deeper understanding of the applied methods like latent class analysis and k-means clustering. Ultimately, this endeavor seeks to enrich critical care knowledge and potentially influence future research and clinical practice.
The paper, linked here, we will be working with is:
Gaining access to MIMIC data via PhysioNet can be a somewhat complex process. It's advisable to initiate this process a few weeks before your classes start, as it can take up to a month to complete verification. The process includes account creation, submitting online forms, and a two-hour online training course. Please adhere to the following steps meticulously for a smooth experience.
Click the "topic" links below for details regarding the readings, questions, and tasks for that week.
Week | Topic |
---|---|
Summer | Summer pre-game, i.e. get access to MIMIC data, domain reading |
0-1 | Introduction to topic, domain, and paper |
2 - 3 | Dive into the MIMIC-III dataset + EDA |
4 - 5 | Begin data preprocessing and learn about Elixhauser comorbidity index |
6 - 7 | Start implementing k-means and LCA |
8 - 9 | Perform Network Analysis and visuals |
10 | Project wrap up and debrief |
Winter Break | Brainstorm and prepare for Phase II |
Check out the domain expertise page I have set up. This page will provide guideance, resources, and help you to better understand all of the non data science elements to this project, e.g. EHR, Illness scores, etc. This is only a starting point, and you may need to do further research, reading and interviewing with domain experts to gain the knowledge and answers you need. I will still assist you in these poursuits, but it is important as a data scientist that you learn how to track down information and do research.
Also take a look at my resources drive folder which has relevant papers and books which will be useful when working on this project.
We will explore the creation of some real world reporting aspect for sepsis, along with an advanced analysis, machine learning model, or data pipeline/warehousing feature.
The MIMIC dataset, a critical care database, embodies the intersection of medical research and ethical considerations. It provides a wealth of de-identified patient data, enabling researchers to study critical care conditions while respecting patient privacy. However, the use of this dataset still requires careful ethical considerations.
The principle of autonomy is upheld in the MIMIC dataset as it consists of de-identified patient data, ensuring patient consent and privacy. However, researchers must remain vigilant to avoid any potential re-identification of individuals, which could infringe on patient privacy. The principle of beneficence is reflected in the use of the MIMIC dataset for research aimed at improving critical care outcomes. However, researchers must ensure that their studies are designed to maximize potential benefits, such as enhancing understanding of critical care conditions or developing new treatment strategies.
Reproducibility is a key aspect of research using the MIMIC dataset. It promotes transparency and trust in the findings derived from the dataset. However, reproducing research must be balanced with maintaining data security. Researchers must ensure that they use secure methods to access and analyze the MIMIC dataset, protecting the data from unauthorized access or misuse. Remember, while the MIMIC dataset provides a valuable resource for critical care research, it also underscores the importance of ethical considerations in medical research. Researchers must navigate these ethical considerations carefully to ensure they respect patient rights and interests while advancing critical care research.
Participation in the weekly discussion section is mandatory. Each week, during phase 1, you are responsible for doing the reading/task assigned in the schedule and submitting answers to the listed questions before discussion section begins.
Weekly assigned questions help me to observe how you are all doing on the project, as well as to focus your work for the week and help prepare you for discussion. If you have questions about your work, please ask them in section, on discord, or in office hours (I will rarely comment on your submission answers).
Impromptu - Send me a message on Slack!