Data input
The following data is required by the scripts of this repository:
- ICOADS v3.0. Freeman. et al., (2017).
- Metadata from WMO Publication 47. Kent. et al., (2007)
- CLIWOC logbook IDs. (needs a link)
- Inventory of ship names in the US Maury Collection
- generate_id (needs description)
-
Precision criteria file. An estimate of the precision of each key variable (e.g.
sst, lat, lon
) perdck
,yr
and orsid
. This precision criteria is require in order to set tolerances when comparing variables from ICOADS (See the list of ICOADS variables used in this repository). Comparison of variables allows for a match between reports in the duplicate identification procedure. - Json files containing ITU callsign prefixes associated with a country.
- seq IDS. (needs description)
Processing stages
The diagram below is a summary of the data processing workflow followed by
the shell scripts defined in scr
. Each block
represents a main task done by one script in rscripts
.
The corresponding .R
file name has been added in grey between each block. For more information on eah .R
script, please look into the API reference page.
- Orange blocks represent pre-processing tasks done to the ICOADS data, in order to:
- Select data taken only by commercial ships, excluding specialist ship data sources, such as research vessels (For more information see the selection criteria).
-
Preprocessing of ID's to improve duplicate
identification and linking of
id
's between each pair of duplicate reports. - Preformed quality control on the data to point out the best duplicate.
- The rest of the blocks represent processing scripts that concentrate in the duplicates identification and matching of reports ID's.
graph TB
A1[rscripts]
id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.]
id2 --> |simple_dup.R|id3[First duplicate identification between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position.]
id3 --> |ship2plat.R|id4[Exclude non-ship data.]
id4 --> id5[(ICOADS SHIP data)]
id5 --> |process_ids.R|id6[Homogenize and re-format <br> ship IDs from different decks. <br> Links metadata from Pub 47 & logbooks <br> to formed a plausible ship track.]
id6 --> |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.]
id7 --> |new_get_pairs.R|id8[Second duplicate identification. <br>
Pairs the ship reports as duplicate if <br>
the contents match within tolerance <br>
and the ID match is appropriate. <br>
Reports from DCK that are of lower quality,<br>
or that are less complete, or that fail<br>
the track check are flagged as the worst.]
id8 --> |new_get_dups.R|id9[Counts the number of duplicates and flags the best.]
id9 --> |new_merge_ids_year.R|id10[Links ID's into classes.]
id10 --> |clean_data.R|id11[Clean of ship data.]
id11 --> |clean2track.R|id12[Forms ship tracks for linked IDs.]
id12 --> id13[(Output data)]
classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
classDef scripts fill:#8C929D,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
classDef rest fill:#e8eaf6,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
class id2,id3,id4 pre-processing;
class A1,id1,id5,id13 scripts;
class id6,id7,id8,id9,id10,id11,id12 rest