JASMIN tips

This is an old version of this page. You can view the most recent version or browse the history.

This page is meant to supplement the How-to-install page.

Activating conda R environment

On some Linux systems Conda does not run or activate environments properly, this is due to environment paths not being properly set. The following has been found to work on JASMIN

export conda_base=`conda info --base`
source ${conda_base}/etc/profile.d/conda.sh
conda activate <your_env>

where \<your_env\> is the name of the environment you wish to activate.

Installing R packages not on conda-forge or available through the `conda install` command

Many, but not all, R packages can be installed via conda forge, e.g.

conda install -c conda-forge <required_package>

Those not available through conda-forge can still be installed via CRAN by downloading the source code and installing from source. When doing this pkg-config needs to be configured properly, with the PKG_CONFIG_PATH environment variable set to point at the correct directory for the current environment. e.g.

export PKG_CONFIG_PATH="/home/users/random_user/.conda/envs/r_env0/lib/pkgconfig"

for user random_user and environment r_env0. Package ggplot2, for example, could then be installed via:

wget https://cran.r-project.org/src/contrib/ggplot2_3.3.1.tar.gz
R CMD INSTALL ggplot2_3.3.1.tar.gz

When installing a package via this method it is likely that the first attempt will fail due to missing dependencies. If this happens install those dependencies and then try again.

Running new_merge_ids_year.R interactively

The script new_merge_ids_year.R is typically run in batch mode, e.g.

Rscript new_merge_ids_year.R <year1> <year2>

Where \<year1\> and \<year2\> are the first and last year to run respectively. For testing purposes the script can also be run interactively by starting R, setting the variable args2 to first and last year and then sourcing the script.

R
args2 <- c(1980,1980)
source('new_merge_ids_year.R)

As part of the script the number of arguments are checked and if zero, as is the case when run interactively, the script uses the values in the variable args2 instead.

Running background jobs without hanging up

Long jobs on JASMIN, and other Linux systems, can be run in the background using the command nohup to prevent the job being terminated if the connection is lost. For example, running R script myscript.R with arguments arg1 and arg2 would be run using

nohup Rscript myscript.R arg1 arg2 >& myscript_log.txt &

It should then be safe to log out and return later to check the status of the job. tail -f myscript_log.txt can be used to monitor the log file in real time.

SLURM tips

To gather information of your runs in SLURM you can run the following command to store job statistics in a output file:

sacct --starttime <for-the-last-x-hours> -u <username> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist >> path_to_log_output/job_stats.out

The starttime should be in the following format: 2020-07-01T13:00:00.

It is also possible to gather job statistics for specific job id's:

sacct -j <JOB_ID> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist >> path_to_log_output/job_stats.out