Welcome!

PM 566: Introduction to Health Data Science

Instructor

Kelly Street

  • Assistant Professor of Biostatistics
  • Department of Population and Public Health Sciences
  • Office hours: Wednesday 2pm, SSB 202V

Brightspace + Website

https://github.com/USCbiostats/PM566
Official class website Syllabus, reading materials, slides, labs, and assignments

https://blackboard.usc.edu/
Announcements + Grading

USC Software Development Help Page

https://uscbiostats.github.io/software-dev-site/

collection of knowledge about

  • computing
  • standards
  • research practices

What is data science?

  • Data science is an exciting discipline that allows you to turn raw data into understanding, insight, and knowledge.

(Also see here, and here, and here, and here )

What is this course?

  • This course is a introduction to the world of data science with a focus on application in the health sciences.

  • The course will teach data science skills that are easily transferable, with examples done in R.

  • You can use any language/tool you prefer. But we can only guarantee help if you are using R and RStudio.

What ISN’T this course?

This is not a formal statistics class. You will not be expected to know or use:

  • parametric distributions
  • hypothesis tests
  • p values

What ISN’T this course?

Data does not exist in a vacuum. In order to gain new insights from data, you must start with a baseline understanding of the subject. “Domain knowledge” or “subject matter expertise” is critical, but it is not the purpose of this class.

This course will focus on applications in Public Health, but the skills you learn will be widely transferable.

What is a computer?

File Structure

File Structure

Before computers had graphics and mice, there were only text-based interfaces, called command lines, that let you interact with the directories and files on the computer.

The modern “Desktop” is actually just a directory on your computer!

  • MacOS/Linux: /Users/<username>/Desktop
  • Windows: C:\Users\<username>\Desktop

The route from the root directory to any specific file or directory is called the “path”.

File Structure

Whenever you run a program on your computer, you are running it in a specific location (directory). If you want to access another file on your computer, you’ll need to know the path to that file. Paths can be either relative or absolute.

How to get from my Desktop directory to my Documents directory via:

  • absolute path: /Users/kstreet/Desktop
  • relative path: ../Desktop/

File Structure

Special symbols:

  • . Current directory
  • .. Parent directory (one step up the hierarchy)
  • ~ Home directory

We won’t have to use the command line too much in this class, but understanding file paths will be very important!

CARC

At USC, the Center for Advanced Research Computing (CARC) provides students and faculty with high-performance computing capabilities. The interface for working on CARC is entirely text based, meaning it operates via a command line.

https://www.carc.usc.edu/

What is R?

R logo

R is a language and environment for statistical computing and graphics: https://r-project.org

Created by statisticians for statisticians.

Over 16,000 packages added to CRAN

What is RStudio?

RStudio logo

RStudio is an integrated development environment (IDE) for R: https://www.rstudio.com/products/rstudio/

R in the terminal

R + RStudio

Break time

Following break we will run Lab 1

Dataset

Dataset

First Lab

The lab exercises can be found at:

Website -> Schedule -> -> Lab Exercise

https://uscbiostats.github.io/PM566/labs/lab-01/01-lab.html

Related Github Issue

https://github.com/USCbiostats/PM566/issues/54