References
R
- R. A platform for statistical computing.
- RStudio. An IDE for R. The most straightforward way to get into using R and Quarto.
- R Graphics Cookbook. Complete guide to plotting data with
ggplot
. - R Style Guide. Write readable code.
- RStudio Cheatsheets Other quick guides, including information about using RStudio’s IDE and some of the main tools in R.
Quarto
- Quarto An integrated, open-source publishing system. Generalizes and expands upon a lot of the functionality of RMarkdown.
- Quarto Guide Comprehensive guide to creating a wide range of documents and presentations with Quarto.
Git / GitHub
- Git. Version control system. Installs with Apple’s Developer Tools, or get the latest version via Homebrew.
- GitHub. Host public Git repositories for free. Pay to host private ones. Also a source for publicly available code (e.g. R packages and utilities) written by other people.
- GitHub Docs Tutorials for performing various actions using git and GitHub, from beginner to advanced.
Markdown / R Markdown
- Markdown tutorial: An interactive tutorial to practice using Markdown.
- Markdown cheatsheet: Useful one-page reminder of Markdown syntax.
- R Markdown Cheatsheet An overview of Markdown and RMarkdown conventions.
- R Markdown documentation from the makers of RStudio. Lots of good examples.
Data Science
- Viridical Data Science Great book on the practice of data science by Bin Yu and Rebecca Barter.
- R Programming for Data Science Book on using R for data science, by Roger Peng.
- Jenny Bryan’s Stat 545. Notes and tutorials for a Data Analysis course taught by Jennifer Bryan at the University of British Columbia. Lots of useful material.
- The Plain Person’s Guide to Plain Text Social Science: Why you should write data-based reports using plain-text tools.
- Karl Broman’s Tutorials and Guides Accurate and concise guides to many of the tools and topics described here, including getting started with reproducible research, using git and GitHub, and working with knitr.
- Makefiles for OCR and converting Shapefiles. Some further examples of
Makefiles
in the data-analysis pipeline, by Lincoln Mullen
Tools
- Apple’s Developer Tools Unix toolchain. Install directly with
xcode-select --install
, or just try to use e.g.git
from the terminal and have OS X prompt you to install the tools. - Homebrew package manager. A convenient way to install several of the tools here, including Emacs and Pandoc.
- R. A platform for statistical computing.
- Python and SciPy. Python is a general-purpose programming language increasingly used in data manipulation and analysis.
- RStudio. An IDE for R. The most straightforward way to get into using R and RMarkdown.
- TeX and LaTeX. A typesetting and document preparation system. You can write files in
.tex
format directly, but it is more useful to just have it available in the background for other tools to use. The MacTeX Distribution is the one to install for macOS. - Pandoc. Converts plain-text documents to and from a wide variety of formats. Can be installed with Homebrew. Be sure to also install
pandoc-citeproc
for processing citations and bibliographies, andpandoc-crossref
for producing cross-references and labels. - Git. Version control system. Installs with Apple’s Developer Tools, or get the latest version via Homebrew.
- GitHub. Host public Git repositories for free. Pay to host private ones. Also a source for publicly available code (e.g. R packages and utilities) written by other people.
- GNU Make. You tell
make
what the steps are to create the pieces of a document or program. As you edit and change the various pieces, it automatically figures out which pieces need to be updated and recompiled, and issues the commands to do that. See Karl Broman’s Minimal Make for a short introduction. Make will be installed automatically with Apple’s developer tools. - lintr and flycheck. Tools that nudge you to write neater code.
- Zotero. A citation manager that incorporates PDF storage, annotation, and other features. Zotero is free to use and can export to BibTeX/BibLaTeX files.
Paid Applications and Services
- Backblaze. Secure off-site backup.
- Marked 2. Live HTML previewing of Markdown documents. Mac OS X only.
- Sublime Text. Python-based text editor.
- Mendeley, and Papers are additional citation managers that incorporate PDF storage, annotation, and other features. Mendeley has a premium tier. Papers is a paid application after a trial period. I haven’t used either of these, so I can’t confirm whether or not they export to BibTeX/BibLaTeX files. Papers can supposedly output citation keys in pandoc’s format, among several others.
Data
Many of these websites offer publicly available datasets that can be used for research or class projects.
Health and Biological data
- CDC National Center for Health Statistics
- NIH Cancer Surveillance
- World Health Organization WHO data
- UniProt data
- The Gene Ontology Project
- Gene Expression Omnibus Data
- US Center for Disease Control and Prevention Data
- California Health and Human Services Open Data Portal
- Covid Data CovidTracker
- USC Sustainability Data
- Bureau of Transportation Statistics
Government data
- US Open Data Initiative DATA.GOV
- Census Data Explorer and National Historical Geographic Information System (NHGIS)
- Bureau of Economic Analysis
- Bureau of Labor Statistics
- Housing data Zillow
- Bureau of Justice Statistics National Center for Education Statistics: The Nation’s Report Card
- Los Angeles city data
- Los Angeles crime data
Other data
- World Bank open data
- Inter-university Consortium for Political and Social Research (ICPSR)
- FiveThirtyEight open data
- Kaggle datasets
- Literally all of Wikipedia
Social Networks