Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
  • JASMIN tips

Last edited by bearecinos Aug 21, 2020
Page history
This is an old version of this page. You can view the most recent version or browse the history.

JASMIN tips

This page is meant to supplement the How-to-install page.

Activating conda R environment

On some Linux systems Conda does not run or activate environments properly, this is due to environment paths not being properly set. The following has been found to work on JASMIN

export conda_base=`conda info --base`
source ${conda_base}/etc/profile.d/conda.sh
conda activate <your_env>

where \<your_env\> is the name of the environment you wish to activate.

Installing R packages not on conda-forge or available through the conda install command

Many, but not all, R packages can be installed via conda forge, e.g.

conda install -c conda-forge <required_package>

Those not available through conda-forge can still be installed via CRAN by downloading the source code and installing from source. When doing this pkg-config needs to be configured properly, with the PKG_CONFIG_PATH environment variable set to point at the correct directory for the current environment. e.g.

export PKG_CONFIG_PATH="/home/users/random_user/.conda/envs/r_env0/lib/pkgconfig"

for user random_user and environment r_env0. Package ggplot2, for example, could then be installed via:

wget https://cran.r-project.org/src/contrib/ggplot2_3.3.1.tar.gz
R CMD INSTALL ggplot2_3.3.1.tar.gz

When installing a package via this method it is likely that the first attempt will fail due to missing dependencies. If this happens install those dependencies and then try again.

Running new_merge_ids_year.R interactively

The script new_merge_ids_year.R is typically run in batch mode, e.g.

Rscript new_merge_ids_year.R <year1> <year2>

Where \<year1\> and \<year2\> are the first and last year to run respectively. For testing purposes the script can also be run interactively by starting R, setting the variable args2 to first and last year and then sourcing the script.

R
args2 <- c(1980,1980)
source('new_merge_ids_year.R)

As part of the script the number of arguments are checked and if zero, as is the case when run interactively, the script uses the values in the variable args2 instead.

Running background jobs without hanging up

Long jobs on JASMIN, and other Linux systems, can be run in the background using the command nohup to prevent the job being terminated if the connection is lost. For example, running R script myscript.R with arguments arg1 and arg2 would be run using

nohup Rscript myscript.R arg1 arg2 >& myscript_log.txt &

It should then be safe to log out and return later to check the status of the job. tail -f myscript_log.txt can be used to monitor the log file in real time.

SLURM tips

To gather information of your runs in SLURM you can run the following command to store job statistics in a output file:

sacct --starttime <for-the-last-x-hours> -u <username> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist >> path_to_log_output/job_stats.out

The starttime should be in the following format: 2020-07-01T13:00:00.

It is also possible to gather job statistics for specific job id's:

sacct -j <JOB_ID> --format=User,JobID,Jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist >> path_to_log_output/job_stats.out
Clone repository

Wiki pages

Home

Introduction
Installation
JASMIN tips

Workflow
- Data selection
- Processing of ID's
- Matching criteria
- Quality control
- Duplicate identification

API Reference

Releases

Examples