This document describes the current configuration of the bioghost.usc.edu server. Be advised that this document is permanently work-in-progress.
The following instructions have been written for a server running CentOS Linux release 7.3.1611 (Core) that has been registered to have an static IP/www address (see here).
What is it?
Valgrind is an instrumentation framework for building dynamic analysis tools. There are Valgrind tools that can automatically detect many memory management and threading bugs, and profile your programs in detail. (valgrind.org)
We use it mostly to check memory leaks in R packages when running R CMD check --use-valgrind
How to install it
$ sudo yum install valgrind
What is it?
GNU Wget is a free software package for retrieving files using HTTP, HTTPS and FTP, the most widely-used Internet protocols. It is a non-interactive commandline tool, so it may easily be called from scripts, cron jobs, terminals without X-Windows support, etc. (GNU Operating System)
How to install it
$ sudo yum install wget
What is it?
Perl is a family of high-level, general-purpose, interpreted, dynamic programming languages. The languages in this family include Perl 5 and Perl 6. (wiki)
How to install it
For now, we use perl to install texlive
$ sudo yum install perl perl-Digest-MD5
What is it?
TeX Live is a free software distribution for the TeX typesetting system that includes major TeX-related programs, macro packages, and fonts. (wiki)
How to install it
Download and execute the installer as follows:
$ wget http://mirror.ctan.org/systems/texlive/tlnet/install-tl-unx.tar.gz
$ tar xzf install-tl-unx.tar.gz
$ cd ./install-tl-20170209/
$ sudo perl install-tl
Setup process, make sure you activate the option “create symlinks to standard directories”. This will make available pdflatex
and friends system wide.
What is it?
Pandoc is a free and open-source software document converter, widely used as a writing tool (especially by scholars) and as a basis for publishing workflows. It was originally created by John MacFarlane, a philosophy professor at the University of California, Berkeley. (wiki)
How to install it
Not right now, we just use the RStudio default
What is it?
How to install it
$ sudo yum install mlocate
What is it?
QPDF is a command-line program that does structural, content-preserving transformations on PDF files. project’s website
How to install it
$ sudo yum install qpdf
What is it?
R is a programming language and software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. (wiki)
How to install it
Add the following repository
$ # Info from https://cran.rstudio.com/bin/linux/redhat/README
$ sudo rpm -Uvh https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
Install R
$ sudo yum install R
For personalizing system-wide R-sessions, type ?Rprofile
in the R terminal. The following R packages are installed (besides the default ones):
For more information, visit the “Ubuntu Packages for R” section in the R-project website: https://cran.r-project.org/bin/linux/ubuntu/README.html
What is it?
RStudio is a free and open-source integrated development environment (IDE) for R, a programming language for statistical computing and graphics. (wiki))
How to install it
Installation is as easy as typing the following commands
wget sudo yum install --nogpgcheck rstudio-server-rhel-1.0.136-x86_64.rpm
For security, we change the access Port by editing the following file /etc/rstudio/rserver.conf
. Just add/replace the following line
www-port=3624
This will make RStudio server to accept connections only from port 3624. This way, to login to RStudio server, you must type the following address in your web-browser URL bar http://bioghost.usc.edu:3624, but first, we need to open port 3624
To open a port in CentOS 7, we do the following
$ sudo firewall-cmd --zone=public --add-port=3624/tcp --permanent
$ sudo firewall-cmd --reload
Finally, in order to make sure the login is smooth, we need to copy the pam login profile of the system to rstudio’s
$ sudo cp /etc/pam.d/login /etc/pam.d/rstudio
$ sudo rstudio-server restart
This solution was posted here, in the RStudio blog.
For more information, visit the RStudio server website https://www.rstudio.com/products/rstudio/download-server/
Troubleshooting
What to do when RStudio server says
rserver[23961]: ERROR system error 98 (Address already in use); OCCURRED AT: rstudio::core::Error rstudio::core::http::initTcpIpAcceptor(rstudio::core::http::SocketAcce...
$ sudo rstudio-server stop
$ ps -aux | grep rserver
$ sudo kill -9 [pid] # Or you can kill them all with sudo killall -9 rserver
$ sudo rstudio-server start
devtools
R packageWhat is it?
The aim of devtools is to make package development easier by providing R functions that simplify common tasks. (GitHub website of the project)
How to install it
For the curl
package, run libcurl-devel sudo yum install libcurl-devel
For the openssl
and git2r
packages, run sudo yum install openssl-devel
Once the dependencies are install, each user can install it’s own copy of `devtools by typing
roxygen2
packageWhat is it?
How to install it
It requires the xml2
R-package, which requires the libxml2-devel
library sudo yum install libxml2-devel
Once the dependencies are install, each user can install it’s own copy of `devtools by typing
rgl
packageWhat is it?
How to install it
Currently, while the package’s installation process is successful, creating rgl devices is not possible at the time due to issues with X11 (needs to be resolved). For now, here are the instructions to install rgl.
To install rgl, we need some extra libraries:
$ sudo yum install xauth
Once the dependencies are install, each user can install it’s own copy of `devtools by typing
http://linuxtoolkit.blogspot.com/2015/10/libgl-error-unable-to-load-driver.html
$ sudo yum install mesa-libGL-devel mesa-libGLU-devel libpng-devel
More information go to the package’s README file
What is it?
I use it to check the ports usage: netstat -anp | grep 3624
will list the connections that are using port 3624.
How to install it
$ sudo yum install net-tools
This needs BiocManager (which can be install via install.packages) and, for some indirect dependency, the libjpeg library, which was installed using sudo yum install libjpeg-*
. Once the dependencies are done (there could be a few more), you can install the package using BiocManager::install("DMRcate")
.
rjags
R packageWhat is it?
Interface to the JAGS MCMC library. (rjags at CRAN)
How to install it (if you have sudo)
Install the jags and jags-dev packages in the OS:
$ sudo rpm -Uhv http://download.opensuse.org/repositories/home:/cornell_vrdc/CentOS_7/x86_64/jags4-4.1.0-65.2.x86_64.rpm
$ sudo rpm -Uhv http://download.opensuse.org/repositories/home:/cornell_vrdc/CentOS_7/x86_64/jags4-devel-4.1.0-65.2.x86_64.rpm
$ sudo rpm -Uhv http://download.opensuse.org/repositories/home:/cornell_vrdc/CentOS_7/x86_64/jags4-debuginfo-4.1.0-65.2.x86_64.rpm
Then, install the R package using install.packages("rjags")
.
How to install it (if you don’t have sudo)
Get JAGS http://mcmc-jags.sourceforge.net/, in particular from https://sourceforge.net/projects/mcmc-jags/files/JAGS/4.x/Source/
tar xf JAGS*.tar.gz
Need to get LAPACK https://github.com/Reference-LAPACK/lapack/archive/v3.9.0.tar.gz
What is it?
is free and open-source cross-platform web server software, released under the terms of Apache License 2.0. Apache is developed and maintained by an open community of developers under the auspices of the Apache Software Foundation (Apache at Wiki)
How to install it
Install httpd
$ sudo yum -y install httpd
Start and enable the service
$ sudo systemctl start httpd
$ sudo systemctl enable httpd.service
Setting up the firewall (adding port 80, http)
$ sudo firewall-cmd --permanent --add-port=80
The webserver should show a default website when following http://bioghost.usc.edu. The default can be modified in /var/www/html/
What is it?
RStudio Connect is a new publishing platform for the work your teams create in R. Share Shiny applications, R Markdown reports, Plumber APIs, dashboards, plots, and more in one convenient place. Use push-button publishing from the RStudio IDE, scheduled execution of reports, and flexible security policies to bring the power of data science to your entire enterprise. (RStudio connect website)
How to install it
RStudio grants full licenses for educational purposes. In our case, Prof. Marjoram got us a 1 year license to try it out (usually is for 45 days-only). To install and active RStudio connect you need to contact RStudio directly. Once it has been set up, one important bit is to configure the login capabilities:
Change the file /etc/rstudio-connect/rstudio-connect.gcfg
and set the following option: Password = pam
[Authentication]
Provider = pam
This will cause RStudio connect to use the same login system that we use in Bioghost.
Edit the file /etc/pam.d/rstudio-connect
to be as the following:
#%PAM-1.0
auth [user_unknown=ignore success=ok ignore=ignore default=bad] pam_securetty.so
auth substack system-auth
auth include postlogin
account required pam_nologin.so
account include system-auth
password include system-auth
# pam_selinux.so close should be the first session rule
session required pam_selinux.so close
session required pam_loginuid.so
session optional pam_console.so
# pam_selinux.so open should only be followed by sessions to be executed in the user context
session required pam_selinux.so open
session required pam_namespace.so
session optional pam_keyinit.so force revoke
session include system-auth
session include postlogin
-session optional pam_ck_connector.so
These very same rules are the ones we use for RStudio Server.
Stop and start the service as follows:
sudo service rstudio-connect stop
sudo service rstudio-connect start
And you should be good to go.
Currently RStudio connect is set with the default configuration for http access. So to access the dashboard you need to go to http://bioghost.usc.edu:3939. This address is also the one you need to use in order to setup RStudio connect in your RStudio server machine.
Adding RStudio connect to RStudio
You can use this directly with RStudio. The only requirement is to have a prevmed account (which all of you have), and be under the USC network (VPN if you are outside USC). To publish either a shiny app or any other markdown document (html output of course), you need to do the following:
To add the new publishing tool:
To publish new content, open any Rmarkdown/Shiny app and:
Here you can find an example app that I just published: http://bioghost.usc.edu:3939/rstudio-connect-example/ You need to login to be able to look at it!
What is it?
Shiny is an R package that makes it easy to build interactive web apps straight from R. You can host standalone apps on a webpage or embed them in R Markdown documents or build dashboards. You can also extend your Shiny apps with CSS themes, htmlwidgets, and JavaScript actions. (shiny website)
How to install it
Make sure you have shiny
and rmarkdown
available system-wide, from R:
To install
$ wget https://download3.rstudio.org/centos5.9/x86_64/shiny-server-1.5.5.872-rh5-x86_64.rpm
$ sudo yum install --nogpgcheck shiny-server-1.5.5.872-rh5-x86_64.rpm
You need to open the default port (which can be modified /etc/shiny-server/shiny-server.conf
) as follows: shell $ sudo firewall-cmd --zone=public --add-port=3838/tcp --permanent $ sudo firewall-cmd --reload
To start/stop (more details here) shell $ sudo /sbin/service shiny-server start1
To add apps, you need to add them to the folder /srv/shiny-server
, and you can access them using http://bioghost.usc.edu:3838 (the default website).
What is it?
How to install it?
Follow the instructions here https://docs.docker.com/engine/installation/linux/docker-ce/centos/#set-up-the-repository
To allow group usage, follow these instructions: https://askubuntu.com/questions/477551/how-can-i-use-docker-without-sudo
What is it?
Autoconf is an extensible package of M4 macros that produce shell scripts to automatically configure software source code packages. These scripts can adapt the packages to many kinds of UNIX-like systems without manual user intervention. Autoconf creates a configuration script for a package from a template file that lists the operating system features that the package can use, in the form of M4 macro calls. —-autoconf website
How to install it
$ sudo yum install autoconf
What is it?
How to install it
$ sudo yum install mysql-devel
What is it?
How to install it
We got one of the LICENSES that George Martinez has, from there:
$ sudo -s
$ cd /tmp/
$ mkdir statafiles
$ cd statafiles
$ tar -zxf /home/you/Downloads/Stata15Linux64.tar.gz
$ cd /usr/local
$ mkdir stata15
$ cd stata15
$ /tmp/statafiles/install
And follow the instructions afterwards. To make it system wide available:
$ cd ..
$ sudo chmod -R 755 stata15
What is it
How to install it
Following these instructions:
$ sudo yum install bash-completion
What is it
GNU Multiple Precision Arithmetic Library is a free library for arbitrary-precision arithmetic, operating on signed integers, rational numbers, and floating point numbers. (wiki)
It is required by the R package gmp.
How to install it
$ sudo yum install gmp-devel
Before this configuration works, we need to have the DNS config file pointing to our USC’s prevmed machine (a windows machine). To do so we have to have the folling settings in the /etc/resolv.conf
file:
search usc.edu
nameserver 10.148.80.211
Changing this does not need a restart or anything (see here).
Run on CentOS server as root
yum install realmd sssd ntpdate ntp adcli
(installs the need packages)
systemctl enable ntpd.service
(enables ntp service - required to keep times synced)
ntpdate prevmed.usc.edu
(or name of domain to join - syncs time to domin server)
systemctl start ntpd.service
(starts the ntp service)
realm join -U odin prevmed.usc.edu
(or administrator account and domain if not prevmed.usc.edu)
(should be prompted for password for domain admin account)
No message means it worked.
Can run the follow command to verify the installation. realm list
.
What is it
Syrupy is a Python script that regularly takes snapshots of the memory and CPU load of one or more running processes, so as to dynamically build up a profile of their usage of system resources. (repo)
How to install it
Download the repository using git: git clone https://github.com/jeetsukumaran/Syrupy
Run the installation file sudo python setup.py install
To execute it you just need to type, for example,
$ syrupy.py Rscript -e "matrix(NA, 1000, 1000)"
And it will get the memory snapshots of the command out to a log file that you can later analyze.
You can send batch jobs to R via command line in two ways:
Using the R CMD BATCH
. Its syntax is
Where myjob.Rout
is a plain text file which dumps the output from myjob.R
.
Using Rscript
. Its syntax is
Follows the same logic as the previous method, with the difference that not everything is printed
# Add user
adduser [username]
# Add group
addgroup [groupname]
# Add user to a group
sudo usermod -G[groupname] -a [username]
In the case of Prevmed accounts
sudo realm permit <username@PREVMED.USC.EDU>
If the process does not work, it is usually due to a change in the DNS server address. To solve this, follow these instructions (directly extracted from this website):
You would face this issue after a reboot or a network service restart. This usually happens as the scripts /etc/sysconfig/network-scripts/ifup-post and /etc/sysconfig/network-scripts/ifdown-post checks for the parameters “RESOLV_MODS=no” or “PEERDNS=no” in the network interface configuration file such as /etc/sysconfig/network-scripts/ifcfg-*. If these either of these parameters are not present, it will replace the contents of /etc/resolv.conf with /etc/resolv.conf.save. By default, PEERDNS and RESOLV_MODS are null.
The /etc/resolv.conf file will be overwritten if any network interfaces use DHCP for activation. To prevent this, ensure such interfaces have PEERDNS=no set in their ifcfg file, for example
# cat /etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet
DEVICE=eth0
BOOTPROTO=dhcp
PEERDNS=no
The ifcfg-file directives DNS1 and DNS2 can also lead to modification of resolv.conf. To prevent this, either remove said directives or use chattr(1) to make resolv.conf immutable to changes, i.e.:
# chattr +i /etc/resolv.conf
sudo chown -R www-data:www-data /srv/www sudo chmod -R g+w /srv/www
Setup
Use ssh-keygen
to create private and public keys in the client,
Use ssh-copy-id
to send a copy of the public key to the server
Will prompt you asking for password. After that you won’t be needing it since from this machine you’ll be able to connect simply using ssh -p 2436 vegayon@vega2.usc.edu
In ufw add the following rules needed for R to create a con with localhost:
From R run
and voila!
Further, we can make this shorter. Changing the file /etc/ssh/ssh_config
, we can add the line Port 2436
to make the default outgoing connection to
In the case that you use your desktop or other computer as a server, it is wise to pass the argument master
to the function makePSOCKcluster
. This because usually, while you may have your computer connected to the network working as a server, R uses the name of your computer as default for the master
, hence, instead of passing the argument MASTER=vegayon.usc.edu
(for example), it passes MASTER=george-computer
(which is the name of my computer). A more concrete example:
That works, while the following may not
The index webpage is located as /var/www/html/index.html
Request a certificate to itservices.usc.edu by providing them with the corresponding request: openssl req -nodes -newkey rsa:2048 -keyout bioghost.key -out bioghost.csr
Once approved, download your certificate and the intermediate certificate, e.g. bioghost_usc_edu_cert.cer and bioghost_usc_edu_iterm.cer
and, together with the key generated in step 1, copy them to the following paths:
cp bioghost_usc_edu_cert.cer /etc/pki/tls/certs
cp bioghost.key /etc/pki/tls/private
cp bioghost_usc_edu_interm.csr /etc/pki/tls/private
Update the apache conf file located at /etc/httpd/conf.d/ssl.conf
with the following information:
Directives | Path to Enter |
---|---|
SSLCertificateFile | Certificate file path |
SSLCertificateKeyFile | Key file path |
SSLCertificateChainFile | Intermediate bundle path |
Update the virtual hosts at /etc/httpd/conf/httpd.conf
by setting:
<VirtualHost *:80>
ServerName bioghost.usc.edu
# ServerAlias
Redirect "/" "https://bioghost.usc.edu"
# DocumentRoot "/var/www/html/workshop"
</VirtualHost>
And the file /etc/httpd/conf/ssl.conf
by setting
DocumentRoot "/var/www/html/workshop"
ServerName bioghost.usc.edu:443
Add the https protocol to firewall-cmd sudo firewall-cmd --add-service=https --permanent
Restart the apache server by typing sudo apachectl restart
.
More info at:
A shared folder is created to allow users to transfer files on the bioghost, i.e., without downloading these files to one’s local computer and uploading later. The folder is only for transferring purpose so please do not leave any files unattended or use the /home/shared
folder as storage.
Go to the Terminal under your account on the bioghost (next to the Console in Rstudio).
Copy the file to the shared folder.
Ask the other user to move that file to his/her own directory.
Example | Problem | Details |
---|---|---|
cat /etc/centos-release |
Checkout the version of the OS | |
sudo yum clean all |
Cleans cached repos | Neat when trying to install something fails |
sudo firewall-cmd --zone=public --add-port=3624/tcp --permanent |
Open port 3624 in zone “public” | |
firewall-cmd --get-active-zones |
List active zones in the firewall (more here) | |
sudo cat /var/log/messages |
Show system log |