... | ... | @@ -11,7 +11,8 @@ The following data is required to run the scripts in this repository: |
|
|
[US Maury Collection](https://icoads.noaa.gov/software/transpec/maury/mauri_out)
|
|
|
- generate_id (**needs description**)
|
|
|
- **Precision criteria file**. An estimate of the precision of each key variable (e.g. `sst, lat, lon`) per `dck`,
|
|
|
`yr` and or `sid`. This precision criteria is require in order to set tolerances when comparing variables from ICOADS (See the [list of ICOADS variables](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/wikis/API-Reference#icoads-variables-used) used in this repository). Comparison of variables allows for
|
|
|
`yr` and or `sid`. This precision criteria is require in order to set tolerances
|
|
|
when comparing variables from ICOADS (See the [list of ICOADS variables](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/wikis/API-Reference#icoads-variables-used) used in this repository). Comparison of variables allows for
|
|
|
a match between reports in the [duplicate identification](Workflow/duplicate-identification) procedure.
|
|
|
- **Json files** containing ITU callsign prefixes associated with a country.
|
|
|
- **seq IDS.** (**needs description**)
|
... | ... | @@ -19,7 +20,8 @@ The following data is required to run the scripts in this repository: |
|
|
**Processing stages**
|
|
|
--------------------
|
|
|
|
|
|
The diagram below is a summary of the data processing workflow followed by the shell scripts defined in ```scr```. Each block
|
|
|
The diagram below is a summary of the data processing workflow followed by
|
|
|
the shell scripts defined in ```scr```. Each block
|
|
|
represents a main task done by one script in ```rscripts```.
|
|
|
The corresponding `.R` file name has been added in grey between each block. For more information on eah `.R` script, please look into the [API reference page.](api-reference)
|
|
|
|
... | ... | @@ -30,9 +32,11 @@ in order to: |
|
|
specialist ship data sources, such as research vessels
|
|
|
(For more information see the [selection criteria](Workflow/data-selection)).
|
|
|
2. [Preprocessing of ID's](Workflow/processing-of-ids) to improve [duplicate identification](Workflow/duplicate-identification) and linking of `id`'s between each pair of duplicate reports.
|
|
|
3. Preformed [quality control](Workflow/quality-control) on the data to point out the best duplicate.
|
|
|
3. Preformed [quality control](Workflow/quality-control) on the data to point
|
|
|
out the best duplicate.
|
|
|
|
|
|
- The rest of the blocks represent processing scripts that concentrate in the duplicates identification and [matching of reports ID's](Workflow/matching-criteria).
|
|
|
- The rest of the blocks represent processing scripts that concentrate in the duplicates
|
|
|
identification and [matching of reports ID's](Workflow/matching-criteria).
|
|
|
|
|
|
More details on the data processing can be found in this [technical report]().
|
|
|
|
... | ... | @@ -41,16 +45,16 @@ graph TB |
|
|
A1[rscripts]
|
|
|
|
|
|
id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.]
|
|
|
id2 --> |simple_dup.R|id3[First duplicate identification between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position.]
|
|
|
id3 --> |ship2plat.R|id4[Exclude non-ship data.]
|
|
|
id2 --> |simple_dup.R|id3[Check for cross-type duplicates between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position, <br> with DCK and ID specific selection criteria.]
|
|
|
id3 --> |ship2plat.R|id4[Exclude non-ship data identified in <br> cross-type duplicate analysis.]
|
|
|
id4 --> id5[(ICOADS SHIP data)]
|
|
|
id5 --> |process_ids.R|id6[Homogenize and re-format <br> ship IDs from different decks. <br> Links metadata from Pub 47 & logbooks <br> to formed a plausible ship track.]
|
|
|
id5 --> |process_ids.R|id6[Reformat selected ship IDs to homogenize <br>information between DCKs. Uses IDs <br>from Pub. 47 metadata in ID prioritisation.]
|
|
|
id6 --> |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.]
|
|
|
id7 --> |new_get_pairs.R|id8[Second duplicate identification. <br> Pairs the ship reports as duplicate if <br> the contents match within tolerance and the ID match is appropriate. <br> Reports from DCK that are of lower quality, or that are less complete, or that fail the track check <br> are flagged as the worst.]
|
|
|
id8 --> |new_get_dups.R|id9[Counts the number of duplicates and flags the best.]
|
|
|
id9 --> |new_merge_ids_year.R|id10[Links ID's into classes.]
|
|
|
id10 --> |clean_data.R|id11[Clean of ship data.]
|
|
|
id11 --> |clean2track.R|id12[Forms ship tracks for linked IDs.]
|
|
|
id7 --> |new_get_pairs.R|id8[Groups ship reports as potential duplicates <br> if the contents match within tolerance]
|
|
|
id8 --> |new_get_dups.R|id9[Assesses the groups of potential duplicates, <br> accepting those where the ID match is appropriate. <br> Reports from DCK that are of lower quality, <br>or that are less complete, or that fail the <br> track check are flagged as the worst.]
|
|
|
id9 --> |new_merge_ids_year.R|id10[Assesses IDs that have been associated <br> in previous processing to decide whether to replace <br> all IDs in the associated group with the preferred ID.]
|
|
|
id10 --> |clean_data.R|id11[Runs track checking on data to produce <br>clean tracks for all IDs.]
|
|
|
id11 --> |clean2track.R|id12[Selects data for ship-tracking software <br>Carella et al. 2017,<br> choosing only data with missing or generic IDs.]
|
|
|
id12 --> id13[(Output data)]
|
|
|
|
|
|
classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
|
... | ... | |