PM 566: Introduction to Health Data Science
Kelly Street
https://github.com/USCbiostats/PM566
Official class website Syllabus, reading materials, slides, labs, and assignments
https://blackboard.usc.edu/
Announcements + Grading
https://uscbiostats.github.io/software-dev-site/
collection of knowledge about
This course is a introduction to the world of data science with a focus on application in the health sciences.
The course will teach data science skills that are easily transferable, with examples done in R.
You can use any language/tool you prefer. But we can only guarantee help if you are using R and RStudio.
This is not a formal statistics class. You will not be expected to know or use:
Data does not exist in a vacuum. In order to gain new insights from data, you must start with a baseline understanding of the subject. “Domain knowledge” or “subject matter expertise” is critical, but it is not the purpose of this class.
This course will focus on applications in Public Health, but the skills you learn will be widely transferable.
Before computers had graphics and mice, there were only text-based interfaces, called command lines, that let you interact with the directories and files on the computer.
The modern “Desktop” is actually just a directory on your computer!
/Users/<username>/Desktop
C:\Users\<username>\Desktop
The route from the root directory to any specific file or directory is called the “path”.
Whenever you run a program on your computer, you are running it in a specific location (directory). If you want to access another file on your computer, you’ll need to know the path to that file. Paths can be either relative or absolute.
How to get from my Desktop
directory to my Documents
directory via:
/Users/kstreet/Desktop
../Desktop/
Special symbols:
.
Current directory..
Parent directory (one step up the hierarchy)~
Home directoryWe won’t have to use the command line too much in this class, but understanding file paths will be very important!
At USC, the Center for Advanced Research Computing (CARC) provides students and faculty with high-performance computing capabilities. The interface for working on CARC is entirely text based, meaning it operates via a command line.
R is a language and environment for statistical computing and graphics: https://r-project.org
Created by statisticians for statisticians.
Over 16,000 packages added to CRAN
RStudio is an integrated development environment (IDE) for R: https://www.rstudio.com/products/rstudio/
Following break we will run Lab 1
The lab exercises can be found at:
Website -> Schedule -> -> Lab Exercise
https://uscbiostats.github.io/PM566/labs/lab-01/01-lab.html
Related Github Issue
https://github.com/USCbiostats/PM566/issues/54