Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Processing of IDs

Processing of IDs · Changes

Page history
fixed typos authored May 26, 2020 by bearecinos's avatar bearecinos
Show whitespace changes
Inline Side-by-side
Showing with 16 additions and 17 deletions
+16 -17
  • Workflow/Processing-of-IDs.md Workflow/Processing-of-IDs.md +16 -17
  • No files found.
Workflow/Processing-of-IDs.md
View page @ e4bd445d
The pre-processing tasks described here are done in [`process_ids.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/process_ids.R) Pre-processing tasks described here are done in [`process_ids.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/process_ids.R)
**Corrections** **Corrections**
--------------------- ---------------------
The following corrections to ship names are done in [`add_shipnames.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rutils/add_shipnames.R). The following corrections to ship names happen in [`add_shipnames.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rutils/add_shipnames.R).
- For the period 1878 to 1894 some minor changes to ship names from `dck` 704 are made to correct for typos and other similar problems. - For the period 1878 to 1894 some minor changes to ship names from `dck` 704 are made to correct for typos and other similar problems.
- For `dck` 701 (1867-1899), and 711 (1889-1899) some ship names are correct. - For `dck` 701 (1867-1899) and `dck` 711 (1889-1899) some ship names are corrected.
- For the period 1663 to 1860 CLIWOC logbook (**needs link**) `id`'s from `dck` 730 are convert to ship names using information from the project (https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm) This link does not work. - For the period 1663 to 1860, [CLIWOC logbook (needs link)]()`id`'s from `dck` 730 are converted to ship names using information from the project: <br> https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm This link does not work.
- For the period 1663 to 1863 ship names from the US Maury collection `dck` 701 are extended using information from [this link](http://icoads.noaa.gov/software/transpec/maury/mauri_out). Also data with a missing `id` from `dck` 701 is split into voyages by manual inspection. - For the period 1663 to 1863 ship names from the **US Maury collection** `dck` 701 are extended using information from [this link](http://icoads.noaa.gov/software/transpec/maury/mauri_out). Also data with a missing `id` from `dck` 701 is split into voyages by manual inspection.
- Ship names from the German Maury collection (`dck` 721) are extent where they overlap with names from US Maury (`dck` 701). Where names are the same across `dck` 701 and 721 and it is not clear if the ships are the same, the `dck` number is then also append (AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON). - Ship names from the **German Maury collection** (`dck` 721) are extented where they overlap with names from **US Maury** (`dck` 701). Where names are the same across `dck` 701 and `dck` 721, and it is not clear if the ships are the same, the `dck` number is then also append <br> (e.g. AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON).
- In `dck` 555 (1966-1973) North Pole and South Pole station `id`'s are correct by prepending a "N" or "S" depending on latitude. - In `dck` 555 (1966-1973) North Pole and South Pole station `id`'s are corrected by prepending a "N" or "S" depending on latitude.
**Reformatting** **Reformatting**
--------------------- ---------------------
- Manual corrections to two `id`'s from `dck` 187 (1946-1956) are made to conform the expected format. - Manual corrections to two `id`'s from `dck` 187 (1946-1956) are made to conform the expected format.
- For the period 1953 to 1961 `id`'s from `dck` 184 are truncated to remove the first digit which indicates the ocean region, and the `id`'s are reformat to match the expected form. - For the period 1953 to 1961 `id`'s from `dck` 184 are truncated to remove the first digit which indicates the ocean region, and the `id`'s are reformatted to match the expected form.
- Between 1962 and 1963 the `id` **"Eltanin"** is added to `dck` 897, which contains only data from that ship and has a missing `id`. - Between 1962 and 1963 the `id` **"Eltanin"** is added to `dck` 897, which contains only data from that ship and has a missing `id`.
- Between 1957 and 1961 a small number of `id`'s from `dck` 902 are reformat to match expected format. This is done by prepending a No. "3" to the truncated `id`'s. - Between 1957 and 1961 a small number of `id`'s from `dck` 902 are reformatted to match expected format. This is done by prepending a No. "3" to the truncated `id`'s.
- Between 1930 and 1961 `id`'s for `dck` 118 and 119 (a small number of `id`'s) are reformat to match the expected format. This is done by inserting a 2-digit year. - Between 1930 and 1961 `id`'s for `dck` 118 and 119 (small number of `id`'s) are reformatted to match the expected format. This is done by inserting a 2-digit year.
- For `dck` 720 and `sid` 135, 8-character `id`s represent a single report. These are truncated to the first 4 digits, and a **"-SEQ"** is append. - For `dck` 720 and `sid` 135, 8-character `id`s represent a single report. These are truncated to the first 4 digits, and a **"-SEQ"** is appended.
**Homogenisation** **Homogenisation**
--------------------- ---------------------
The following corrections are done in [`new_homog_ids.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rutils/new_homog_ids.R) The following corrections are made in [`new_homog_ids.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rutils/new_homog_ids.R)
- `id`'s in `dck`'s 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are remove where needed. - `id`'s in `dck`'s 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are removed where needed.
- Some ship `id`'s that are callsigns and need to be reformat to enable linking of data from the same ship across `dck`'s and in order to be linked to metadata information in WMO Publication No. 47 (Pub. 47, Kent et al. 2007, Freeman et al. 2011). - Some ship `id`'s are callsigns and need to be reformat to enable linking of data from the same ship across `dck`'s and to be linked to metadata information in WMO Publication No. 47. For more information see [Kent. *et al.,* (2007)](https://doi.org/10.1175/JTECH1949.1) and [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
- Where an `id` is identified as a callsign or an identifier listed in WMO Publication No. 47, other `id`'s containing the same character string are flagged and leading digits are removed. This is with the purpose of homogenising the callsigns across `dck`'s.
- Where an `id` is identify as a callsign or to a different identifier listed in Pub. 47 other `id`'s containing the same character string are flag and leading digits are remove. This with the purpose of homogenising the callsigns across `dck`'s.
**Linking** **Linking**
------------- -------------
This is not part of process_ids.R. Should we have a special page for the processing done by [`new_merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_merge_ids_year.R). I still need some explanation from Lizz to do this. This is not part of process_ids.R. Should we have a different page for the liking processing done by [`new_merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_merge_ids_year.R). I still need some help from Liz regarding this.
\ No newline at end of file \ No newline at end of file
Clone repository
  • API Reference
  • Examples
  • Home
  • How to install
  • Introduction
  • JASMIN tips
  • Releases
  • Workflow
  • Workflow
    • Data selection
    • Duplicate indentification
    • Matching criteria
    • Processing of IDs
    • Quality control