Syllabus

PM 566: Introduction to Health Data Science

Term: Fall 2023

Time: Friday 9am - 12:55pm

Location: SSB 114

Units: 4

Course Overview

This course serves as an introduction to data science with a focus on the acquisition and analysis of real-life data. Students will learn the toolsets needed to 1) create workable and reproducible datasets by accessing, scraping, sampling and cleaning data; 2) conduct exploratory data analysis and data visualization; 3) apply statistical tools to learn from data; and 4) build functions and basic apps. Coding languages R and Python will be used.

Learning Objectives

Through this course, students will become familiar with the techniques used in data science, applied to health-related datasets. Students will learn:

  • Programming in R, and associated tools Markdown, Git
  • Data visualization – summarizing data through interpretable summaries
  • Data collection – data scraping, wrangling, cleaning, and sampling
  • Exploratory data analysis – generating hypotheses and building intuition
  • Basic statistical algorithms
  • Building software packages and apps

Prerequisite(s): None

Recommended Preparation: Undergraduate course in statistics and programming

Course Notes

Lecture notes presented in class will be posted on GitHub.

Technological Proficiency and Hardware/Software Required

Computation using R (downloaded from http://cran.r-project.org), and development tools including Git (https://github.com/) and Markdown will be used throughout the semester.

Readings and Supplementary Materials

  1. R Programming for Data Science, 2019. Roger Peng. https://bookdown.org/rdpeng/rprogdatascience/

Supplementary References

  1. R for Data Science, 2017. Garrett Grolemund and Hadley Wickham. http://r4ds.had.co.nz/
  2. Exploratory Data Analysis with R, 2020. Roger Peng. https://bookdown.org/rdpeng/exdata/
  3. Mastering Software Development in R, 2017. Roger Peng, Sean Kross, Brooke Anderson. https://bookdown.org/rdpeng/RProgDA/
  4. R Packages, 2023. Hadley Wickham and Jennifer Bryan. https://r-pkgs.org/
  5. Modern Data Science with R, 2023. Benjamin S. Baumer, Daniel T. Kaplan, and Nicholas J. Horton. https://mdsr-book.github.io/mdsr3e/

Description and Assessment of Assignments

Assignments: There will be 5 assignments given throughout the semester, approximately 1 every 2 weeks. Students may discuss the problems with one another, however, individual solutions must be submitted and copying will not be tolerated. All assignments must be completed in Quarto or R Markdown, and submitted through the Github classes portal of the course. Late assignments will be penalized by 20% for each day past the due date.

Final Project: The final project will be to write a report for an analysis applied to a real-world dataset and to create a website that includes interactive visualizations to display data/results. The source code, website files, and PDF report will be uploaded to GitHub.

Labs: Lab attendance is mandatory and participation in the lab is required and counts as part of the overall lab grade.

Grading Breakdown

Assignment % of Grade
Labs 20%
Homework (5) 30%
Midterm Exam 20%
Final Project 30%
TOTAL 100%

Assignment Submission Policy

Assignments shall be submitted on the Github classroom portal of the course. Late homework assignments will not be accepted without penalty, except when verifiable extenuating circumstances can be demonstrated.

Schedule

Week 1 (8/25) Introduction to Data Science tools: R, Python, Markdown, Git, command line tools
Week 2 (9/1) Version Control & Reproducible Research
Week 3 (9/8) Exploratory data analysis
Week 4 (9/15) Data visualization
Week 5 (9/22) Data cleaning and wrangling
Week 6 (9/29) Text Mining
Week 7 (10/6) Scraping, APIs, and Regular Expressions
Week 8 (10/13) Fall Recess (No class). Midterm Exam
Week 9 (10/20) (Midterm Exam due) High performance computing, cloud computing
Week 10 (10/27) Managing big data, SQL and non-SQL languages, Google BigQuery
Week 11 (11/3) Interactive visualization and effective data communication I
Week 12 (11/10) Veterans Day (No class)
Week 13 (11/17) Interactive visualization and effective data communication II
Week 14 (11/24) Thanksgiving Holiday
Week 15 (12/1) Final project workshop: review progress
Week 16 (12/8) Final Project

As the weeks go by, consult the Schedule Page for more information on weekly topics, problem sets, readings, and other materials. The schedule is likely to change as we go. Links to readings, assignments, and other materials from class will be posted on that page.

Academic Integrity

The University of Southern California is foremost a learning community committed to fostering successful scholars and researchers dedicated to the pursuit of knowledge and the transmission of ideas. Academic misconduct is in contrast to the university’s mission to educate students through a broad array of first-rank academic, professional, and extracurricular programs and includes any act of dishonesty in the submission of academic work (either in draft or final form).

This course will follow the expectations for academic integrity as stated in the USC Student Handbook. All students are expected to submit assignments that are original work and prepared specifically for the course/section in this academic term. You may not submit work written by others or “recycle” work prepared for other courses without obtaining written permission from the instructor(s). Students suspected of engaging in academic misconduct will be reported to the Office of Academic Integrity.

Other violations of academic misconduct include, but are not limited to, cheating, plagiarism, fabrication (e.g., falsifying data), knowingly assisting others in acts of academic dishonesty, and any act that gains or is intended to gain an unfair academic advantage.

The impact of academic dishonesty is far-reaching and is considered a serious offense against the university and could result in outcomes such as failure on the assignment, failure in the course, suspension, or even expulsion from the university.

For more information about academic integrity see the student handbook or the Office of Academic Integrity’s website, and university policies on Research and Scholarship Misconduct.

Statement on the use of Artificial Intelligence

Generative artificial intelligence (AI) may be used under the direction and rules specified by the course instructor in specific circumstances as outlined in the syllabus. The student is responsible for the quality and content of all written assignments. Unless otherwise indicated by the course instructor, generative AI may be used to create an initial literature review, document outline and/or to organize material toward a first draft of a class paper, proofreading, or grammatical accuracy; however the final content of the written document and critical thinking of the ideas presented in the document must represent the student’s individual work and ideas learned through course content and/or research conducted from sources outside of the generative AI system. The student must include an annotation on all materials submitted that explicitly documents how AI was used to generate the document and properly reference both the sources and the AI tools such as ChatGPT (OpenAI, 2023). The student must review the information in the document and edit for accuracy, completeness, proper grammar, and demonstrate that the wording accurately reflects the student’s understanding and purpose in writing the text. Students should be aware that text generated solely from AI generators may include factual errors, bias, and may contain incomplete or inaccurate reference information, in addition to furthering appropriating knowledge produced by historically marginalized scholars without proper crediting. If you have any questions on whether a specific AI tool is allowed for any aspect of your work in this class, please ask your instructor for guidance. Failure to ensure agreement with your instructor on use of AI, prior to doing so, may result in a zero score. (NOTE: instructors have sophisticated tools to determine AI plagiarism.)

Students and Disability Accommodations:

USC welcomes students with disabilities into all of the University’s educational programs. The Office of Student Accessibility Services (OSAS) is responsible for the determination of appropriate accommodations for students who encounter disability-related barriers. Once a student has completed the OSAS process (registration, initial appointment, and submitted documentation) and accommodations are determined to be reasonable and appropriate, a Letter of Accommodation (LOA) will be available to generate for each course. The LOA must be given to each course instructor by the student and followed up with a discussion. This should be done as early in the semester as possible as accommodations are not retroactive. More information can be found at http://osas.usc.edu. You may contact OSAS at (213) 740-0776 or via email at .

Support Systems:

Counseling and Mental Health - (213) 740-9355 – 24/7 on call
https://studenthealth.usc.edu/counseling/
Free and confidential mental health treatment for students, including short-term psychotherapy, group counseling, stress fitness workshops, and crisis intervention.

National Suicide Prevention Lifeline - dial 988 – 24/7 on call
http://www.suicidepreventionlifeline.org
Provides free and confidential emotional support to people in suicidal crisis or emotional distress 24 hours a day, 7 days a week.

Relationship and Sexual Violence Prevention Services (RSVP) - (213) 740-9355(WELL), press “0” after hours – 24/7 on call
https://studenthealth.usc.edu/sexual-assault
Free and confidential therapy services, workshops, and training for situations related to gender-based harm.

Office for Equity, Equal Opportunity, and Title IX (EEO-TIX) - (213) 740-5086
http://eeotix.usc.edu
Information about how to get help or help someone affected by harassment or discrimination, rights of protected classes, reporting options, and additional resources for students, faculty, staff, visitors, and applicants.

Reporting Incidents of Bias or Harassment - (213) 740-5086 or (213) 821-8298
http://usc-advocate.symplicity.com/care_report
Avenue to report incidents of bias, hate crimes, and microaggressions to the Office for Equity, Equal Opportunity, and Title for appropriate investigation, supportive measures, and response.

The Office of Student Accessibility Services (OSAS) - (213) 740-0776
http://osas.usc.edu
OSAS ensures equal access for students with disabilities through providing academic accommodations and auxiliary aids in accordance with federal laws and university policy.

USC Campus Support and Intervention - (213) 821-4710
http://campussupport.usc.edu
Assists students and families in resolving complex personal, financial, and academic issues adversely affecting their success as a student.

Diversity, Equity and Inclusion - (213) 740-2101
http://diversity.usc.edu
Information on events, programs and training, the Provost’s Diversity and Inclusion Council, Diversity Liaisons for each academic school, chronology, participation, and various resources for students.

USC Emergency - UPC: (213) 740-4321, HSC: (323) 442-1000 – 24/7 on call
http://dps.usc.edu, http://emergency.usc.edu
Emergency assistance and avenue to report a crime. Latest updates regarding safety, including ways in which instruction will be continued if an officially declared emergency makes travel to campus infeasible.

USC Department of Public Safety - UPC: (213) 740-6000, HSC: (323) 442-1200 – 24/7 on call
http://dps.usc.edu
Non-emergency assistance or information.

Office of the Ombuds - (213) 821-9556 (UPC) / (323-442-0382 (HSC)
http://ombuds.usc.edu
A safe and confidential place to share your USC-related issues with a University Ombuds who will work with you to explore options or paths to manage your concern.

Occupational Therapy Faculty Practice - (323) 442-3340 or
, http://chan.usc.edu/otfp
Confidential Lifestyle Redesign services for USC students to support health promoting habits and routines that enhance quality of life and academic performance.