Data input
- ICOADS v3.0. Freeman. et al., (2017)
- Metadata from WMO Publication 47. Kent. et al., (2007)
- CLIWOC logbook IDs. (couldn't find the link)
- Inventory of ship names in the US Maury Collection
- generate_id (by Dave... not so clear the source)
- Precision criteria: An estimate of the precision of each key variable (e.g. sst, lat, lon) per DCK, year and or SID. This precision criteria is required to set tolerances when allowing a match between reports in the duplicate identification procedure.
- json files.
- seq IDS.
Processing stages
The following diagram is a summary of the data processing workflow follow by
the shell scripts defined in scr
. Each block
represents a main task done by one script inrscripts
.
The corresponding .R
file name is highlight in grey between each stage.
Green blocks represent pre-processing tasks done to the ICOADS data base, in order to:
- Select data taken only by commercial ships, excluding specialist ship data sources, such as research vessels (For more information see the selection criteria).
- Preprocessing of IDs to improve duplicate identification and linking of IDs between each pair of duplicate reports.
- Preformed quality control on the data to point out the best duplicate.
graph TB
A1[rscripts]
id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.]
id2 --> |simple_dup.R|id3[First duplicate identification between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position.]
id3 --> |ship2plat.R|id4[Exclude non-ship data.]
id4 --> id5[(ICOADS SHIP data)]
id5 --> |process_ids.R|id6[Homogenize and re-format <br> ship IDs from different decks. <br> Links metadata from Pub 47 & logbooks <br> to formed a plausible ship track.]
id6 --> |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.]
id7 --> |new_get_pairs.R|id8[Second duplicate identification. <br> Pairs the reports as duplicate if <br> they have associated ship IDs. <br> Reports that fail the track check <br> are flagged as the worst.]
id8 --> |new_get_dups.R|id9[Count the number of duplicates and flag the best.]
id9 --> |new_merge_ids_year.R|id10[Links of ID's into classes.]
id10 --> |clean_data.R|id11[Cleans of ship data.]
id11 --> |clean2track.R|id12[Forms ship tracks for linked IDs.]
id12 --> id13[(Output data)]
classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:18px,font-weight:500
classDef scripts fill:#8C929D,stroke:#333,stroke-width:1px,font-size:18px,font-weight:500
classDef rest fill:#e8eaf6,stroke:#333,stroke-width:1px,font-size:18px,font-weight:500
class id2,id3,id4 pre-processing;
class A1,id1,id5,id13 scripts;
class id6,id7,id8,id9,id10,id11,id12 rest