Workflow

This is an old version of this page. You can view the most recent version or browse the history.

Data input

ICOADS v3.0. Freeman. et al., (2017)
Metadata from WMO Publication 47. Kent. et al., (2007)
CLIWOC logbook IDs. (couldn't find the link)
Inventory of ship names in the US Maury Collection
generate_id (by Dave... not so clear the source)
Precision criteria: An estimate of the precision of each key variable (e.g. sst, lat, lon) per DCK, year and or SID. This precision criteria is required to set tolerances when allowing a match between reports in the duplicate identification procedure.
json files.
seq IDS.

Processing stages

The following diagram is a summary of the data processing workflow follow by the shell scripts defined in scr. Each block represents a main task done by one script inrscripts. The corresponding .R file name is highlight in grey between each stage.

Orange blocks represent pre-processing tasks done to the ICOADS data base, in order to:

Select data taken only by commercial ships, excluding specialist ship data sources, such as research vessels (For more information see the selection criteria).
Preprocessing of IDs to improve duplicate identification and linking of IDs between each pair of duplicate reports.
Preformed quality control on the data to point out the best duplicate.

graph TB
A1[rscripts]

 id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.]
 id2 --> |simple_dup.R|id3[First duplicate identification between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position.] 
 id3 --> |ship2plat.R|id4[Exclude non-ship data.] 
 id4 --> id5[(ICOADS SHIP data)]
 id5 --> |process_ids.R|id6[Homogenize and re-format <br> ship IDs from different decks. <br> Links metadata from Pub 47 & logbooks <br> to formed a plausible ship track.]
 id6 -->  |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.] 
 id7 --> |new_get_pairs.R|id8[Second duplicate identification. <br> Pairs the reports as duplicate if <br> they have associated ship IDs. <br> Reports that fail the track check <br> are flagged as the worst.] 
 id8 --> |new_get_dups.R|id9[Count the number of duplicates and flag the best.]
 id9 --> |new_merge_ids_year.R|id10[Links of ID's into classes.]
 id10 --> |clean_data.R|id11[Cleans of ship data.] 
 id11 --> |clean2track.R|id12[Forms ship tracks for linked IDs.] 
 id12 --> id13[(Output data)]

classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
classDef scripts fill:#8C929D,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
classDef rest fill:#e8eaf6,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
class id2,id3,id4 pre-processing;
class A1,id1,id5,id13 scripts;
class id6,id7,id8,id9,id10,id11,id12 rest