The R programming language

Free software oriented for statistical computing and graphics (although people use for everything now a days)
Is maintain mostly by the R Core Team (about 20 members worldwide), with the support of The R Foundation
Has a thriving community of users and developers worldwide:
- Institutional support from: rOpenSci, R Consortium, etc.
- About 13,000 packages(libraries) on CRAN (The Comprehensive R Archive Network)
- More than 1,000 attendees in useR!2017 conference

A little bit of history

From left to right: John Chambers, Robert Gentleman, and Ross Ihaka.

Picture by the New York Times https://nyti.ms/2GC3ruP — From left to right: John Chambers, Robert Gentleman, and Ross Ihaka.

R (R Core Team 2018) is an implementation of the S (Statistics) programming language, which was created in 1976 by John Chambers while at Bell Labs.
R itself was created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand. First release in 1995.
Currently developed by the R Development Core Team, of which Chambers is a member.

(Source wiki)

The first lesson: Getting help

in R

In R, if you want to:

Know about the sqrt function? ?sqrt, or help("sqrt")
Know about the makeCluster function in the R package parallel? ?parallel::makeCluster, or help("makeCluster", package="parallel")
Know about the Regular Expressions? ??"Regular Expressions", or help.search("Regular Expressions")
See a full list of the functions and help files available in the package boot: help(package="boot").
Look at more in deph information about the Matrix package? vignette(package="Matrix")

On the web

Checkout the CRAN Task Views
Take a look at the rstats tag on StackOverflow.com
Visit the r-bloggers.com website
Read one of the dozens of online free books about R created with the R package bookdown at bookdown.org
Ask a question on twitter using the #rstats hashtag.

The art of R programming https://nostarch.com/artofr.htm (Matloff 2011)
Advanced R http://adv-r.had.co.nz/ (Wickham 2015)
R Programming for Data Science https://bookdown.org/rdpeng/rprogdatascience/ (Peng 2012)
R for Data Science http://r4ds.had.co.nz/ (Wickham and Grolemund 2016)
Scientific Programming and Simulation using R (Jones, Maillardet, and Robinson 2009)

The first lesson: Getting help (How to read it?)

The first lesson: Getting help (a mental model)

My own personal way of looking for R-based solutions to my problems (in science… of course)

Questions

Using the stats package, How can you estimate a generalized linear model in R?
What is the command to transpose a matrix in R? What about the command for inverting a matrix?
Looking at CRAN task Views, what R packages are available for convex optimization? What about stochastic optimization?
Create a list of R packages that provide wrappers for working with Slurm.
What does return the function for fitting nonlinear least squares in the stats package?

Part 2: R language fundamentals.

Creating objects

In R you can create new objects by either using the assign operator (<-) or the equal sign =, for example, the following 2 are equivalent:
```
a <- 1
a =  1
```
And it works for any type of data
```
# Hey! here is a comment!.
# Comments in R can be either at the begining of a line ...
b <- "aloha" # ... or at the end of it!
```
Historically the assign operator is the most common used.

Data types

The most basic data (object) type in R are vectors.

myvec1 <- vector("logical", 3)  # This is a logical vector of length 3
myvec2 <- c(FALSE, TRUE, FALSE) # This is ALSO a logical vector of length 3
myvec2

## [1] FALSE  TRUE FALSE

length(myvec2)

## [1] 3

Vectors can be of different modes: logical, integer, numeric (double), complex, character, and raw. (so one mode per vector!).

Vectors can also be lists, which is a VERY special type of object in R (most of its objects are lists!)

mylist1 <- vector("list", 2) # This is a list with 2 empty slots
mylist2 <- list(NULL, NULL)  # This is ALSO a list with 2 empty slots
mylist1

## [[1]]
## NULL
## 
## [[2]]
## NULL

Moreover, lists elements can have names

mylist2 <- list(first = NULL, second = 1, third = c("a", "b", "c"))
mylist2

## $first
## NULL
## 
## $second
## [1] 1
## 
## $third
## [1] "a" "b" "c"

And we can access them as follows

# Getting the 3rd element of the list
mylist2$third      # By name using the dollar sign
mylist2[[3]]       # By position
mylist2[["third"]] # By name using a string

## [1] "a" "b" "c"

Attributes and Structure

Objects in R have attributes (metadata)

# Here are the attributes of the iris dataset
attributes(iris)

## $names
## [1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width" 
## [5] "Species"     
## 
## $row.names
##   [1]   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
##  [18]  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34
##  [35]  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51
##  [52]  52  53  54  55  56  57  58  59  60  61  62  63  64  65  66  67  68
##  [69]  69  70  71  72  73  74  75  76  77  78  79  80  81  82  83  84  85
##  [86]  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 101 102
## [103] 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119
## [120] 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136
## [137] 137 138 139 140 141 142 143 144 145 146 147 148 149 150
## 
## $class
## [1] "data.frame"

head(iris)

##   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1          5.1         3.5          1.4         0.2  setosa
## 2          4.9         3.0          1.4         0.2  setosa
## 3          4.7         3.2          1.3         0.2  setosa
## 4          4.6         3.1          1.5         0.2  setosa
## 5          5.0         3.6          1.4         0.2  setosa
## 6          5.4         3.9          1.7         0.4  setosa

You can take a general look at the structure of an R object with the str() function

# Let's take a look at the iris dataset, again
str(iris)

## 'data.frame':    150 obs. of  5 variables:
##  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
##  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
##  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
##  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
##  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...

Missing values

R has different types of Missing values:

NA: no information, has length 1,
NULL: which has length 0,
Inf: Infinite, and
NaN: Not a Number

str(c(NA, 1L))        # Integers can have NAs
##  int [1:2] NA 1
str(c(NaN, 1L, Inf))  # But not NaN or Inf (automatic coercion)
##  num [1:3] NaN 1 Inf
str(c(-Inf, 1, NULL, +Inf)) # And Nulls are... of length 0!
##  num [1:3] -Inf 1 Inf

In the second line, we have an example of automatic coercion
These have companion functions is.na(), is.null, is.infinite (or is.finite(), which covers NA, Inf, and NaN), and is.nan.

Questions

What is the mode of the following vector myvector <- c(NA, NaN, Inf)? (try not to use the mode() function in R)
The c() function can be used with other vectors, for example
```
my_integer_vector <- c(1L, 2L, 3L)    
my_string_vector  <- c("hello", "world")
```
What is the mode of the vector c(my_integer_vector, my_string_vector)?
What do each one of the functions is.na(), is.null, is.infinite, is.finite(), and is.nan return on the vector myvector?
What are the attributes of the following object mymat <- matrix(runif(12), ncol=4)?

Linear Algebra

Matrix: A special class of array, are vectors with a dim attribute of length 2 (number of rows and number of columns).

mymat <- matrix(1:9, ncol=3) # A matrix with numbers 1 through 9 with 3 columns
mymat

##      [,1] [,2] [,3]
## [1,]    1    4    7
## [2,]    2    5    8
## [3,]    3    6    9

str(mymat)

##  int [1:3, 1:3] 1 2 3 4 5 6 7 8 9

R stores elements in column major-order. We can access matrices elements in the following way.

mymat[1, 2] # The element in the (1, 2) location of the matrix
## [1] 4
mymat[4]    # The fourth element in the vector... which is (1, 2)
## [1] 4

Matrices can have row and column names too.

Some matrix fundamental operators, for matrices A, and B (both square) with the same dimensions (just for the examples):
- Element wise product, addition, substraction, and quotient: A * B, A + B, A - B, A/B
- Transpose t(A)
- Inverse solve(A)
- Column and row bind, cbind(A, B), rbind(A, B)
See the help("%*%"), help("cbind")

Questions

Given the following matrices A, B, and C
```
set.seed(122)
A <- matrix(rnorm(12), ncol=3)
B <- matrix(rnorm(12), nrow=3)
C <- A %*% cbind(1:3) + rnorm(4) # Adding a vector of length 4!
```
Compute
1. Matrix multiplication
2. Transpose of A
3. Element-wise product of A and B’
4. The inverse of the cross product of A
5. $(A^\mbox{t} A)^{-1} A^\mbox{t} C$

Other fundamental types

Other fundamental type of objects in R are:

Factors: A set of integer codes with labels (levels).

myfactor <- factor(c("a", "a", "b", "xyz"))
myfactor

## [1] a   a   b   xyz
## Levels: a b xyz

We treat factors as vectors.

Data frames: “A data frame is a list of variables of the same number of rows with unique row names, given class”data.frame“.”

# Data frames can have multiple type objects, here, factor and numeric
mydf <- data.frame(a_letter = c("a", "b", "c"), a_number = c(1, 2, 100))
mydf

##   a_letter a_number
## 1        a        1
## 2        b        2
## 3        c      100

str(mydf)

## 'data.frame':    3 obs. of  2 variables:
##  $ a_letter: Factor w/ 3 levels "a","b","c": 1 2 3
##  $ a_number: num  1 2 100

We can access data frame objects as follow

mydf$a_letter      # Dollar sign
mydf[[1]]          # Position of the column
mydf[["a_letter"]] # Name of the column

## [1] a b c
## Levels: a b c

Statistical Functions

R has a very nice set of families of distributions. In general, distribution functions have the following name structure:
1. Random Number Generation: r[name-of-the-distribution], e.g. rnorm for normal, runif for uniform.
2. Density function: d[name-of-the-distribution], e.g. dnorm for normal, dunif for uniform.
3. Cumulative Distribution Function (CDF): p[name-of-the-distribution], e.g. pnorm for normal, punif for uniform.
4. Inverse of the CDF (quantile): q[name-of-the-distribution], e.g. qnorm for the normal, qunif for the uniform.

In the case of pseudo random numbers (the r prefix), it is important to always set the seed to ensure reproducibility

# First run
set.seed(12)
rnorm(4)
## [1] -1.4805676  1.5771695 -0.9567445 -0.9200052

# Second run
set.seed(12)
rnorm(4)
## [1] -1.4805676  1.5771695 -0.9567445 -0.9200052

More distributions available at ??Distributions.

Take a look at the normal distribution

set.seed(12)
op <- par(mfrow = c(2,2))
hist(rnorm(1e5))
curve(qnorm)
curve(pnorm, xlim=c(-3, 3))
curve(dnorm, xlim=c(-3, 3))

par(op)

Take a look at the exponential

set.seed(12)
op <- par(mfrow = c(2,2))
hist(rexp(1e5))
curve(qexp)
curve(pexp, xlim=c(0, 6))
curve(dexp, xlim=c(0, 6))

par(op)

Questions

Draw 1e5 samples from a chi2 with 2 degrees of freedom (hint: check ?Distributions).
Draw 1e5 samples from a chi2 with 2 degrees of freedom using rnorm (hint: Recall that if $X\sim N(0,1)$, then $X^2\sim\chi^2_1$, and if $X, Y\sim N(0,1)$, then $X^2 + Y^2\sim\chi^2_2$).

Control-flow statements

For-loops

# The oldfashion forloop
for (i in 1:10) {
  print(paste("I'm step", i, "/", 10))
}

## [1] "I'm step 1 / 10"
## [1] "I'm step 2 / 10"
## [1] "I'm step 3 / 10"
## [1] "I'm step 4 / 10"
## [1] "I'm step 5 / 10"
## [1] "I'm step 6 / 10"
## [1] "I'm step 7 / 10"
## [1] "I'm step 8 / 10"
## [1] "I'm step 9 / 10"
## [1] "I'm step 10 / 10"

Ifelse

# A nice ifelse
for (i in 1:10) {

  if (i %% 2) # Modulus operand
    print(paste("I'm step", i, "/", 10, "(and I'm odd)"))
  else
    print(paste("I'm step", i, "/", 10, "(and I'm even)"))

}

## [1] "I'm step 1 / 10 (and I'm odd)"
## [1] "I'm step 2 / 10 (and I'm even)"
## [1] "I'm step 3 / 10 (and I'm odd)"
## [1] "I'm step 4 / 10 (and I'm even)"
## [1] "I'm step 5 / 10 (and I'm odd)"
## [1] "I'm step 6 / 10 (and I'm even)"
## [1] "I'm step 7 / 10 (and I'm odd)"
## [1] "I'm step 8 / 10 (and I'm even)"
## [1] "I'm step 9 / 10 (and I'm odd)"
## [1] "I'm step 10 / 10 (and I'm even)"

While

# A while
i <- 10
while (i > 0) {
  print(paste("I'm step", i, "/", 10))
  i <- i - 1
}

## [1] "I'm step 10 / 10"
## [1] "I'm step 9 / 10"
## [1] "I'm step 8 / 10"
## [1] "I'm step 7 / 10"
## [1] "I'm step 6 / 10"
## [1] "I'm step 5 / 10"
## [1] "I'm step 4 / 10"
## [1] "I'm step 3 / 10"
## [1] "I'm step 2 / 10"
## [1] "I'm step 1 / 10"

Functions

Understanding functions, and functional programming can be very useful for your day-to-day work with R (and any programming language).
Functions can be created in a single line, or span multiple lines of code.

Usually, functions are structured as follow:

name_of_the_function <- function(
  argument1,    # First argument, required value
  argument2 = 1 # Second argument, has a default (so is optional)
) {

  [your neat R code here]

  return(answer)

}

This is the very basic that we can write in R

# A function that does nothing
my_lame_function <- function() {}
my_lame_function()

## NULL

Here is a more complicated function

calc_sum <- function(A, B) {
  A + B
}

calc_sum(1, 5)

## [1] 6

We can use recursions too!

# An inefficient implementation of the Fibonacci number
fib <- function(i) {

  if (i <= 2)
    return(1)

  fib(i - 2) + fib(i - 1)

}

c(fib(1), fib(2), fib(4), fib(5), fib(6), fib(7), fib(8))

## [1]  1  1  3  5  8 13 21

Part 3: An extended example using HPCC

Agenda

High-Performance Computing: An overview
Parallel computing in R
Extended examples

First: How to use R on HPC

There are two things that you need to do to use R in HPC:

Source the corresponding R version: For example, if you want to work with version 3.4, you could just type
```
source /usr/usc/R/3.4.0/setup.sh
```
You can also include that line in you ~/.bash_profile file so that is done automatically on your login.
Specify the R library path: In order to be able to use R packages that where install in your session while running Slurm (for example), you have to specify the library path. There are a couple of ways of doing it:
1. Use the .libPaths() command at the begining of your R script
2. Use the lib.loc option when calling library()
3. Use the .Renviron file and set the R_LIBS value (see ?Renviron)

We have examples at the end of the presentation.

High-Performance Computing: An overview

Loosely, from R’s perspective, we can think of HPC in terms of two, maybe three things:

Big data: How to work with data that doesn’t fit your computer
Parallel computing: How to take advantage of multiple core systems
Compiled code: Write your own low-level code (if R doesn’t has it yet…)

(Checkout CRAN Task View on HPC)

Big Data

Buy a bigger computer/RAM memory (not the best solution!)
Use out-of-memory storage, i.e., don’t load all your data in the RAM. e.g. The bigmemory, data.table, HadoopStreaming R packages
Store it more efficiently, e.g.: Sparse Matrices (take a look at the dgCMatrix objects from the Matrix R package)

Parallel computing

Flynn’s Classical Taxonomy (Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National Laboratory)

GPU vs CPU

NVIDIA Blog

Why are we still using CPUs instead of GPUs?

GPUs have far more processor cores than CPUs, but because each GPU core runs significantly slower than a CPU core and do not have the features needed for modern operating systems, they are not appropriate for performing most of the processing in everyday computing. They are most suited to compute-intensive operations such as video processing and physics simulations. (bwDraco at superuser)
Why use OpenMP if GPU is suited to compute-intensive operations? Well, mostly because OpenMP is VERY easy to implement (easier than CUDA, which is the easiest way to use GPU).

When is it a good idea?

Ask yourself these questions before jumping into HPC!

Parallel computing in R

While there are several alternatives (just take a look at the High-Performance Computing Task View), we’ll focus on the following R-packages for explicit parallelism:

parallel: R package that provides ‘[s]upport for parallel computation, including random-number generation’.
rslurm: ‘Send long-running or parallel jobs to a Slurm workload manager (i.e. cluster) using the slurm_call or slurm_apply functions.’

Implicit parallelism, on the other hand, are out-of-the-box tools that allow the programmer not to worry about parallelization, e.g. such as gpuR for Matrix manipulation using GPU.

Parallel workflow

Create a cluster:
1. PSOCK Cluster: makePSOCKCluster: Creates brand new R Sessions (so nothing is inherited from the master), even in other computers!
2. Fork Cluster: makeForkCluster: Using OS Forking, copies the current R session locally (so everything is inherited from the master up to that point). Not available on Windows.
3. Other: makeCluster passed to snow
Copy/prepare each R session:
1. Copy objects with clusterExport
2. Pass expressions with clusterEvalQ
3. Set a seed
Do your call:
1. mclapply, mcmapply if you are using Fork
2. parApply, parLapply, etc. if you are using PSOCK
Stop the cluster with clusterStop

parallel example 1: Parallel RNG

# 1. CREATING A CLUSTER
library(parallel)
cl <- makePSOCKcluster(2)    

# 2. PREPARING THE CLUSTER
clusterSetRNGStream(cl, 123) # Equivalent to `set.seed(123)`

# 3. DO YOUR CALL
ans <- parSapply(cl, 1:2, function(x) runif(1e3))
(ans0 <- var(ans))

#               [,1]          [,2]
# [1,]  0.0861888293 -0.0001633431
# [2,] -0.0001633431  0.0853841838

# I want to get the same!
clusterSetRNGStream(cl, 123)
ans1 <- var(parSapply(cl, 1:2, function(x) runif(1e3)))

all.equal(ans0, ans1) # All equal!

# [1] TRUE

# 4. STOP THE CLUSTER
stopCluster(cl)

parallel example 1: Parallel RNG (cont.)

In the case of makeForkCluster

# 1. CREATING A CLUSTER
library(parallel)

# The fork cluster will copy the -nsims- object
nsims <- 1e3
cl    <- makeForkCluster(2)    

# 2. PREPARING THE CLUSTER
RNGkind("L'Ecuyer-CMRG")
set.seed(123) 

# 3. DO YOUR CALL
ans <- do.call(cbind, mclapply(1:2, function(x) {
  runif(nsims) # Look! we use the nsims object!
               # This would have fail in makePSOCKCluster
               # if we didn't copy -nsims- first.
  }))
(ans0 <- var(ans))

# Same sequence with same seed
set.seed(123) 
ans1 <- var(do.call(cbind, mclapply(1:2, function(x) runif(nsims))))

ans0 - ans1 # A matrix of zeros

# 4. STOP THE CLUSTER
stopCluster(cl)

parallel example 2: Simulating $\pi$

We know that $\pi = \frac{A}{r^2}$. We approximate it by randomly adding points $x$ to a square of size 2 centered at the origin.
So, we approximate $\pi$ as $\Pr\{\|x\| \leq 1\}\times 2^2$

The R code to do this

pisim <- function(i, nsim) {  # Notice we don't use the -i-
  # Random points
  ans  <- matrix(runif(nsim*2), ncol=2)
  
  # Distance to the origin
  ans  <- sqrt(rowSums(ans^2))
  
  # Estimated pi
  (sum(ans <= 1)*4)/nsim
}

parallel example 2: Simulating $\pi$ (cont.)

# Setup
cl <- makePSOCKcluster(10)
clusterSetRNGStream(cl, 123)

# Number of simulations we want each time to run
nsim <- 1e5

# We need to make -nsim- and -pisim- available to the
# cluster
clusterExport(cl, c("nsim", "pisim"))

# Benchmarking: parSapply and sapply will run this simulation
# a hundred times each, so at the end we have 1e5*100 points
# to approximate pi
rbenchmark::benchmark(
  parallel = parSapply(cl, 1:100, pisim, nsim=nsim),
  serial   = sapply(1:100, pisim, nsim=nsim), replications = 1
)[,1:4]

#       test replications elapsed relative
# 1 parallel            1   0.650    1.000
# 2   serial            1   1.166    1.794

ans_par <- parSapply(cl, 1:100, pisim, nsim=nsim)
ans_ser <- sapply(1:100, pisim, nsim=nsim)
stopCluster(cl)

#      par      ser        R 
# 3.141561 3.141677 3.141593

Slurm Example 1

Suppose that we would like to maximize/minimize a function using an stochastic optimization algorithm, namely, the Artificial Bee Colony algorithm
The following R script (01-slurm-abcoptim.R) was designed to work with Slurm (it requires the R package ABCoptim (Vega Yon and Muñoz 2017))

# Include this to tell where everything will be living at
.libPaths("~/R/x86_64-pc-linux-gnu-library/3.4/")

# Default CRAN mirror from where to download R packages
options(repos =c(CRAN="https://cloud.r-project.org/"))

# You need to have the ABCoptim R package
library(ABCoptim)

fun <- function(x) {
  -cos(x[1])*cos(x[2])*exp(-((x[1] - pi)^2 + (x[2] - pi)^2))
}

ans <- abc_optim(rep(0,2), fun, lb=-10, ub=10, criter=50)

saveRDS(
   ans,
   file = paste0(
      "~/hpc-with-r/examples/01-slurm-abcoptim-",
      Sys.getenv("SLURM_JOB_ID"),                 # SLURM ENV VAR
      "-",
      Sys.getenv("SLURM_ARRAY_TASK_ID"),          # SLURM ENV VAR
      ".rds"
))

Notice that we are using SLURM_JOB_ID, and SLURM_ARRAY_TASK_ID to save our results (both environment variables created by slurm)

To run the previous R script, we can use the following bash file (01-slurm-abcoptim.sh)

#!/bin/bash 
#SBATCH --tasks=1
#SBATCH --array=1-3
#SBATCH --job-name=01-slurm-abcoptim
#SBATCH --output=01-slurm-abcoptim-%A_%a.out

source /usr/usc/R/3.4.0/setup.sh
Rscript --vanilla ~/hpc-with-r/examples/01-slurm-abcoptim.R

Here we are taking advantage of the Slurm Arrays, so we are running the same R-script in 3 instances (--array=1-3)
To run the job we just need to type
```
$ sbatch 01-slurm-abcoptim.sh
```
Make sure you modify the file paths so that it matches your files!

Now you try it!

RcppArmadillo + OpenMP + Slurm: Using the `rslurm` package

The rslurm package (Marchand 2017) provides a wrapper of Slurm in R.

Without the need of knowing much about the syntax of slurm, this R package does the following:

Writes an R source file that sets up each node with your current config (packages, libpath, etc.). The outputs are stored in a known folder so these can be fetched out later.
Writes a bash file with the call to sbatch (you can specify options).

Executes the bash file and returns the jobid (you can query its status interatively).

Here is a simple example with our sim_pi function (so we are mixing OpenMP with Slurm!):

library(rslurm)

# How many nodes are we going to be using
nnodes <- 2L

# The slurm_apply function is what makes all the work
sjob <- slurm_apply(
  # We first define the job as a function
  f = function(n) {

    # Compiling Rcpp
    Rcpp::sourceCpp("~/simpi.cpp")

    # Returning pi
    sim_pi(1e9, cores = 8, seed = n*100)

  },
  # The parameters that `f` receives must be passed as a data.frame
  params        = data.frame(n = 1:nnodes), jobname = "sim-pi",

  # How many cpus we want to use (this when calling mcapply)
  cpus_per_node = 1,

  # Here we are asking for nodes with 8 CPUS
  slurm_options = list(`cpus-per-task` = 8),
  nodes         = nnodes,
  submit        = TRUE
)

# We save the image so that later we can use the `sjob` object to retrieve the
# results
save.image("~/sim-pi.rda")

Now you try it!

Thanks!

# R version 3.4.4 (2018-03-15)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 14.04.5 LTS
# 
# Matrix products: default
# BLAS: /usr/lib/libblas/libblas.so.3.0
# LAPACK: /usr/lib/lapack/liblapack.so.3.0
# 
# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
# 
# attached base packages:
# [1] parallel  stats     graphics  grDevices utils     datasets  methods  
# [8] base     
# 
# loaded via a namespace (and not attached):
#  [1] compiler_3.4.4  backports_1.1.2 magrittr_1.5    rprojroot_1.3-2
#  [5] tools_3.4.4     htmltools_0.3.6 yaml_2.1.19     Rcpp_0.12.17   
#  [9] stringi_1.2.3   rmarkdown_1.10  jpeg_0.1-8      highr_0.7      
# [13] knitr_1.20      stringr_1.3.1   digest_0.6.15   evaluate_0.10.1

References

Jones, O., R. Maillardet, and A. Robinson. 2009. Introduction to Scientific Programming and Simulation Using R. Chapman & Hall/Crc the R Series. CRC Press. https://books.google.com/books?id=gnZC525wnzIC.

Marchand, Philippe. 2017. Rslurm: Submit R Calculations to a Slurm Cluster. https://CRAN.R-project.org/package=rslurm.

Matloff, N. 2011. The Art of R Programming: A Tour of Statistical Software Design. No Starch Press Series. No Starch Press. https://books.google.com/books?id=o2aLBAAAQBAJ.

Peng, R. 2012. R Programming for Data Science. Lulu.com. https://books.google.com/books?id=GSePDAEACAAJ.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. https://www.R-project.org/.

Vega Yon, George, and Enyelbert Muñoz. 2017. ABCoptim: An Implementation of the Artificial Bee Colony (ABC) Algorithm. https://github.com/gvegayon/ABCoptim.

Wickham, H. 2015. Advanced R. Chapman & Hall/Crc the R Series. CRC Press. https://books.google.com/books?id=FfsYCwAAQBAJ.

Wickham, H., and G. Grolemund. 2016. R for Data Science: Import, Tidy, Transform, Visualize, and Model Data. O’Reilly Media. https://books.google.com/books?id=vfi3DQAAQBAJ.

Workshop: Introduction to R(for HPC users)

Agenda

Part 1: Talk about the basics: What is R? How to get help!?

Before we start

The R programming language

A little bit of history

The first lesson: Getting help

in R

On the web

Books that I recommend

The first lesson: Getting help (How to read it?)

The first lesson: Getting help (a mental model)

Questions

Part 2: R language fundamentals.

Creating objects

Data types

Attributes and Structure

Missing values

Questions

Linear Algebra

Questions

Other fundamental types

Statistical Functions

Questions

Control-flow statements

Functions

Part 3: An extended example using HPCC

Agenda

First: How to use R on HPC

High-Performance Computing: An overview

Big Data

Parallel computing

GPU vs CPU

When is it a good idea?

Parallel computing in R

Parallel workflow

parallel example 1: Parallel RNG

parallel example 1: Parallel RNG (cont.)

parallel example 2: Simulating \(\pi\)

parallel example 2: Simulating \(\pi\) (cont.)

Slurm Example 1

RcppArmadillo + OpenMP + Slurm: Using the rslurm package

Thanks!

See also

References

Workshop: Introduction to R
(for HPC users)

RcppArmadillo + OpenMP + Slurm: Using the `rslurm` package