General resources

The Center for Advanced Research Computing (formerly HPCC) has tons of resources online. Here are a couple of useful links:

Data Pointers

IMHO, these are the most important things to know about data management at USC’s HPC:

  1. Do your data transfer using the transfer nodes (it is faster).

  2. Never use your home directory as a storage space (use your project’s allotted space instead).

  3. Use the scratch filesystem for temp data only, i.e., never save important files in scratch.

  4. Finally, besides of Secure copy protocol (scp), if you are like me, try setting up a GUI client for moving your data (see this).

The Slurm options they forgot to tell you about…

First of all, you have to be aware that the only thing Slurm does is allocate resources. If your application uses parallel computing or not, that’s another story.

Here some options that you need to be aware of:

Good practices (recomendations)

This is what you should use as a minimum:

#SBATCH --output=simulation.out
#SBATCH --job-name=simulation
#SBATCH --time=04:00:00
#SBATCH --mail-user=[you]@usc.edu
#SBATCH --mail-type=END,FAIL

Also, in your R code

Running R interactively

  1. The HPC has several pre-installed pieces of software. R is one of those.

  2. To access the pre-installed software, we use the Lmod module system (more information here)

  3. It has multiple versions of R installed. Use your favorite one by running

    module load usc r/[version number]

    Where [version number] can be 3.5.6 and up to 4.0.3 (the latest update). The usc module automatically loads gcc/8.3.0, openblas/0.3.8, openmpi/4.0.2, and pmix/3.1.3.

  4. It is never a good idea to use your home directory to install R packages, that’s why you should try using a symbolic link instead, like this

    cd ~
    mkdir -p /path/to/a/project/with/lots/of/space/R
    ln -s /path/to/a/project/with/lots/of/space/R R

    This way, whenever you install your R packages, R will default to that location

  5. You can run interactive sessions on HPC, but this recommended to be done using the salloc function in Slurm, in other words, NEVER EVER USE R (OR ANY SOFTWARE) TO DO DATA ANALYSIS IN THE HEAD NODES! The options passed to salloc are the same options that can be passed to sbatch (see the next section.) For example, if need to do some analyses in the thomas partition (which is private and I have access to), I would type something like

    salloc --account=lc_pdt --partition=thomas --time=02:00:00 --mem-per-cpu=2G

    This would put me in a single node allocating 2 gigs of memory for a maximum of 2 hours.

NoNos when using R