Configuration file
After installing the environment in JASMIN you need to make sure to specify the correct paths to the code repository, input and output data. Make your own changes to the configuration file.
Note Check with Liz where the latest path location for the drifting buoys since we use her ICOADS output of data.
Griding process
The re-gridding of SST data and the interpolation of the data to the buoy's coordinates is done by a single script under:
~/scripts/coarse_cci_sst.py
The corresponding SLURM script to submit this job in JASMIN can be found under ~/src/run_gridding.slurm
.
Several things need to be set up before running that script:
- Make slurm-log output directories:
cd ~/orchestra-sst/
mkdir slurm_log_output
mkdir input_data
You should have the following directory configuration under ~/orchestra-sst/
input_data
scripts
slurm_log_output
config.ini
src
sst_tools
- Modify paths and job array set up in the file
~/orchestra-sst/src/run_gridding.slurm
:
#!/bin/bash
#SBATCH --partition=short-serial-4hr
#SBATCH --array=3001-6574
#SBATCH --job-name=sst_buoy_regrid
#SBATCH --output=../slurm_log_output/sst_buoy_regrid_%A_%a.out
#SBATCH --error=../slurm_log_output/sst_buoy_regrid_%A_%a.err
#SBATCH --mem=4000
#SBATCH --time=01:00:00
source activate ~/miniconda3/envs/sst
start_year=1993
start_month=1
start_day=1
echo "Analysing data for $SLURM_ARRAY_TASK_ID days"
echo "since $start_year $start_month $start_day minus a day"
python ~/orchestra-sst/scripts/coarse_cci_sst.py $start_year $start_month $start_day $SLURM_ARRAY_TASK_ID
echo "Done slurm task ID = $SLURM_ARRAY_TASK_ID"
Make sure you modify #SBATCH --array=1-3000
at the moment to process the entire time series you need to run it in two parts
first from day 1 to 3000, and later from 3001 to 6574. So the job can be run in the short-serial-4hr
queue (next part: #SBATCH --array=3001-6574
).
- After configuring the SLURM script you can run it by:
cd ~/orchestra-sst/src/
sbatch run_gridding.slurm
For more information on the processing workflow of this script read the following jupyter notebook
After the re-gridding of data this can be downloaded to either your local pc, or you can run the rest of the scripts in a sci-server in JASMIN. There is no need for a cluster, and the scripts can be run locally.
However, you always need to activate the same environment.
Computing time series differences
We want to know compare the two data sets produced in the previous section and calculate the following:
-
Moving averages over time from each data set (monthly, yearly or quarterly averages)
-
The differences' cell by cell between the time averaged sst values of both data sets.
Both tasks happen in the script ~/scripts/sst_differences.py
To run this you should follow the next steps:
- Make sure you activate your environment and have change the paths to the re-gridding data output in the configuration file, see link, and assign an output path for this script under the variable
time_series
:
vim ~/orchestra_sst/config.ini
- Modify in
sst_diferences.py
the type of data to process and the time average that you want (lines 25-27):
# Input specified resolution in space and time
# Here we will process 1x1 degree resolution data
res = 1
# We will calculate a time series based on a monthly average.
alias = '1M'
res: the resolution of the data set to analysed (e.g. 1 or 2 degree)
alias: is the time average offset used by xarray.resample function to compute the average over time. The format of the alias has to be the same the ones used by xarray and pandas. More information about offsets can also be read in this pandas documentation. For this project we only use monthly averages (1M
), yearly averages (1Y
) and quarterly starting in January (QS-JAN
).
- The script now can be run by:
(sst_env)$ python sst_diferences.py
The output of this script is a .csv file with the sst differences over the southern ocean.
A detail explanation of the processing can be found in this jupyter notebook and in the workflow python module.
Saving output as netcdf
We don't just one a time series average but sometimes a gridded product of the time series, to play around later with the difference between the two data sets. A gridded time series data set is processed by the script: ~/scripts/combined_data_to_netcdf.py
.
Like the script above you need to specify a path output in the config.ini
file under the variable combined_data.
And you also need to specify in combined_data_to_netcdf.py
the alias and resolution to process.
The output is a netcdf file with the time series averages and bin statistics of the two sst data sets (CCI SST coarse and CCI SST interpolated to the buoy coordinates and average on that same coarse resolution).
The analysis of the time series and main conclusions are explained in the following jupyter notebook.