The scripts in this repository consist on a pure R package, but it has several dependencies which can be installed by the following instructions.
All the required packages should work on any platform and on linux based systems. The code has been tested in R v3.5.1 and in R v3.6.3
Dependencies
Here is a list of all dependencies to run the code. The code has been tested with the most recent version of the following packages:
Developing tools:
- devtools
- pryr
- config
Data processing tools
- stringdist
- geosphere
- jsonlite
- lubridate
- igraph
External R software
- imma
- icoads.utils
- maptools
- reticulate
- chron
Install dependencies with conda (all platforms)
This is the recommended way to install all the dependencies. So when the code is run, either in a laptop or cluster you don't have to re-install the R packages for a new session.
Prerequisites
You should have a recent version of the conda package manager.
You can get conda by installing miniconda, which is what we recommend here to keep track of your R environment.
See the following blog post: using the R language with Anaconda, for more information.
Conda environment
Once conda is installed on your system you can easily create a fixed R environment to use in every run by:
conda create -n r_env r-essentials r-base
Then activate it:
conda activate r_env
To install the code dependencies you must have activated your environment. You will know is activated once you see the name of the environment (e.g r_env) in () at the beginning of your bash alias:
(r_env) [brecinos@jasmin-sci2 ~]$
To install dependencies simply do:
conda install -c r r-"package_name"
For example:
conda install -c r r-config
Note: Always google the command since some packages might require:
conda install -c conda-forge r-"package name"
IMMA toolbox
The IMMA data format is used for the disemmination of the ICOADS marine data. The R-package imma provides function to apply quality control to the ICOADS data.
Must be installed manually by getting the .tar.gz file from the package repository and running the following script once your conda environment has been activated:
conda install ./imma-master.tar.gz
if the above doesn't work try:
R CMD INSTALL imma-master.tar.gz
ICOADS-utils toolbox
The icoads.utils package is a collection of utility functions to assist with the homogenization of platform identifier information and the identification of duplicated records in the processing tasks carried out by the scripts in rscripts
.
Must be installed manually by getting the .tar.gz file from the package repository and running the following script once your conda environment has been activated:
conda install ./icoads.utils-master.tar.gz
if the above doesn't work try:
R CMD INSTALL icoads.utils-master.tar.gz
Install the repository itself
For this to work you'll need to have the git software installed on your system. Then, clone the latest repository version:
git clone git@git.noc.ac.uk:brecinosrivas/icoads-r-hostace.git
If you are inside JASMIN, you might have to configure your Gitlab ssh keys (under Gitlab Profile >> Settings > Ssh keys). And add a JASMIN pub key to your profile (generated from within a JASMIN's sci-server). This in order to enable access from JASMIN sci servers to your NOC gitlab platform and to clone the repository within one of JASMIN's sci servers.
For more information on gitlab ssh keys click here.
Now you can go to the repository by:
cd icoads-r-hostace
And the ls
of the repository should look like this:
~/icoads-r-hostace$ ls
config.yml README.md rscripts rutils scr
Modify the config.yml
according to where do you want your input/output
data to reside. For example, I have added a new folder called: output_data
.
So my local copy of the repository looks like this:
~/icoads-r-hostace$ ls
config.yml output_data README.md rutils OPFILES rscripts scr
OPFILES is my logging directory.
Each script in rscript
will take input data from this folder and write the output to this same folder (e.g. simple_dup.R).
~/icoads-r-hostace/output_data$ ls
CROSS_COAST MFILES_MOORED MFILES_SHIP NEW_PAIRFILES
CROSS_DRIFT MFILES_NOTSHIP MFILES_SHIP_FINAL NEW_TRACK_INPUT
CROSS_MOORED MFILES_PLAT MFILES_SHIP_IDPROC SHIP_CLEAN
MFILES_COAST MFILES_REJECT MFILES_SHIP_PROC
MFILES_DRIFT MFILES_RESEARCH NEW_DUP_FILES