Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
  • Workflow

Workflow · Changes

Page history
added liz changes authored Jun 11, 2020 by bearecinos's avatar bearecinos
Hide whitespace changes
Inline Side-by-side
Showing with 28 additions and 24 deletions
+28 -24
  • Workflow.md Workflow.md +28 -24
  • No files found.
Workflow.md
View page @ 3bbd1e55
...@@ -3,36 +3,40 @@ ...@@ -3,36 +3,40 @@
-------------- --------------
The following data is required to run the scripts in this repository: The following data is required to run the scripts in this repository:
- [ICOADS v3.0](https://icoads.noaa.gov/r3.html). [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775). - [ICOADS v3.0](https://icoads.noaa.gov/r3.html). [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
- Metadata from WMO Publication 47. - Metadata from WMO Publication 47.
[Kent. *et al.,* (2007)](https://doi.org/10.1175/JTECH1949.1) [Kent. *et al.,* (2007)](https://doi.org/10.1175/JTECH1949.1)
- [CLIWOC logbook IDs](https://stvno.github.io/page/cliwoc/) - [CLIWOC logbook IDs](https://stvno.github.io/page/cliwoc/)
- Inventory of ship names in the - Inventory of ship names in the
[US Maury Collection](https://icoads.noaa.gov/software/transpec/maury/mauri_out) [US Maury Collection](https://icoads.noaa.gov/software/transpec/maury/mauri_out)
- generate_id (**needs description**) - generate_id (**needs description**)
- **Precision criteria file**. An estimate of the precision of each key variable (e.g. `sst, lat, lon`) per `dck`, - **Precision criteria file**. An estimate of the precision of each key variable (e.g. `sst, lat, lon`) per `dck`,
`yr` and or `sid`. This precision criteria is require in order to set tolerances when comparing variables from ICOADS (See the [list of ICOADS variables](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/wikis/API-Reference#icoads-variables-used) used in this repository). Comparison of variables allows for `yr` and or `sid`. This precision criteria is require in order to set tolerances
when comparing variables from ICOADS (See the [list of ICOADS variables](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/wikis/API-Reference#icoads-variables-used) used in this repository). Comparison of variables allows for
a match between reports in the [duplicate identification](Workflow/duplicate-identification) procedure. a match between reports in the [duplicate identification](Workflow/duplicate-identification) procedure.
- **Json files** containing ITU callsign prefixes associated with a country. - **Json files** containing ITU callsign prefixes associated with a country.
- **seq IDS.** (**needs description**) - **seq IDS.** (**needs description**)
**Processing stages** **Processing stages**
-------------------- --------------------
The diagram below is a summary of the data processing workflow followed by the shell scripts defined in ```scr```. Each block The diagram below is a summary of the data processing workflow followed by
represents a main task done by one script in ```rscripts```. the shell scripts defined in ```scr```. Each block
represents a main task done by one script in ```rscripts```.
The corresponding `.R` file name has been added in grey between each block. For more information on eah `.R` script, please look into the [API reference page.](api-reference) The corresponding `.R` file name has been added in grey between each block. For more information on eah `.R` script, please look into the [API reference page.](api-reference)
- Orange blocks represent pre-processing tasks done to the ICOADS data, - Orange blocks represent pre-processing tasks done to the ICOADS data,
in order to: in order to:
1. Select data taken only by commercial ships, excluding 1. Select data taken only by commercial ships, excluding
specialist ship data sources, such as research vessels specialist ship data sources, such as research vessels
(For more information see the [selection criteria](Workflow/data-selection)). (For more information see the [selection criteria](Workflow/data-selection)).
2. [Preprocessing of ID's](Workflow/processing-of-ids) to improve [duplicate identification](Workflow/duplicate-identification) and linking of `id`'s between each pair of duplicate reports. 2. [Preprocessing of ID's](Workflow/processing-of-ids) to improve [duplicate identification](Workflow/duplicate-identification) and linking of `id`'s between each pair of duplicate reports.
3. Preformed [quality control](Workflow/quality-control) on the data to point out the best duplicate. 3. Preformed [quality control](Workflow/quality-control) on the data to point
out the best duplicate.
- The rest of the blocks represent processing scripts that concentrate in the duplicates identification and [matching of reports ID's](Workflow/matching-criteria). - The rest of the blocks represent processing scripts that concentrate in the duplicates
identification and [matching of reports ID's](Workflow/matching-criteria).
More details on the data processing can be found in this [technical report](). More details on the data processing can be found in this [technical report]().
...@@ -41,16 +45,16 @@ graph TB ...@@ -41,16 +45,16 @@ graph TB
A1[rscripts] A1[rscripts]
id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.] id1[(ICOADS v3.0)] --> |split_by_type.R|id2[Separate records according <br> to the different platform types.]
id2 --> |simple_dup.R|id3[First duplicate identification between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position.] id2 --> |simple_dup.R|id3[Check for cross-type duplicates between <br> ship data and the different platform types. <br> Considers the records as duplicates if they <br> show matching date, time & position, <br> with DCK and ID specific selection criteria.]
id3 --> |ship2plat.R|id4[Exclude non-ship data.] id3 --> |ship2plat.R|id4[Exclude non-ship data identified in <br> cross-type duplicate analysis.]
id4 --> id5[(ICOADS SHIP data)] id4 --> id5[(ICOADS SHIP data)]
id5 --> |process_ids.R|id6[Homogenize and re-format <br> ship IDs from different decks. <br> Links metadata from Pub 47 & logbooks <br> to formed a plausible ship track.] id5 --> |process_ids.R|id6[Reformat selected ship IDs to homogenize <br>information between DCKs. Uses IDs <br>from Pub. 47 metadata in ID prioritisation.]
id6 --> |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.] id6 --> |process_shipdata.R|id7[Process ship data: <br> correction of dates & times.]
id7 --> |new_get_pairs.R|id8[Second duplicate identification. <br> Pairs the ship reports as duplicate if <br> the contents match within tolerance and the ID match is appropriate. <br> Reports from DCK that are of lower quality, or that are less complete, or that fail the track check <br> are flagged as the worst.] id7 --> |new_get_pairs.R|id8[Groups ship reports as potential duplicates <br> if the contents match within tolerance]
id8 --> |new_get_dups.R|id9[Counts the number of duplicates and flags the best.] id8 --> |new_get_dups.R|id9[Assesses the groups of potential duplicates, <br> accepting those where the ID match is appropriate. <br> Reports from DCK that are of lower quality, <br>or that are less complete, or that fail the <br> track check are flagged as the worst.]
id9 --> |new_merge_ids_year.R|id10[Links ID's into classes.] id9 --> |new_merge_ids_year.R|id10[Assesses IDs that have been associated <br> in previous processing to decide whether to replace <br> all IDs in the associated group with the preferred ID.]
id10 --> |clean_data.R|id11[Clean of ship data.] id10 --> |clean_data.R|id11[Runs track checking on data to produce <br>clean tracks for all IDs.]
id11 --> |clean2track.R|id12[Forms ship tracks for linked IDs.] id11 --> |clean2track.R|id12[Selects data for ship-tracking software <br>Carella et al. 2017,<br> choosing only data with missing or generic IDs.]
id12 --> id13[(Output data)] id12 --> id13[(Output data)]
classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center classDef pre-processing fill:#fcc679,stroke:#333,stroke-width:1px,font-size:16px,font-weight:100,text-align:center
......
Clone repository

Wiki pages

Home

Introduction
Installation
JASMIN tips

Workflow
- Data selection
- Processing of ID's
- Matching criteria
- Quality control
- Duplicate identification

API Reference

Releases

Examples