Pre-processing tasks described here are done in [`process_ships.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/process_ships.R) **Corrections** --------------------- The following corrections to ship names happen in [`add_shipnames.R`](https://git.noc.ac.uk/brecinosrivas/icoads.utils/-/blob/master/R/add_shipnames.R). - For the period 1878 to 1894 some minor changes to ship names from `dck` 704 are made to correct for typos and other similar problems. - For `dck` 701 (1867-1899) and `dck` 711 (1889-1899) some ship names are corrected. - For the period 1663 to 1860, CLIWOC logbook `id`'s from `dck` 730 are converted to ship names using table information from a MS ACCESS database that is no longer available online. Some information is available from:
https://www.historicalclimatology.com/cliwoc.html - For the period 1663 to 1863 ship names from the **US Maury collection** `dck` 701 are extended using information from [this link](http://icoads.noaa.gov/software/transpec/maury/mauri_out). Also data with a missing `id` from `dck` 701 is split into voyages by manual inspection. - Ship names from the **German Maury collection** (`dck` 721) are extented where they overlap with names from **US Maury** (`dck` 701). Where names are the same across `dck` 701 and `dck` 721, and it is not clear if the ships are the same, the `dck` number is then also append
(e.g. AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON). - In `dck` 555 (1966-1973) North Pole and South Pole station `id`'s are corrected by prepending a "N" or "S" depending on latitude. **Reformatting** --------------------- - Manual corrections to two `id`'s from `dck` 187 (1946-1956) are made to conform the expected format. - For the period 1953 to 1961 `id`'s from `dck` 184 are truncated to remove the first digit which indicates the ocean region, and the `id`'s are reformatted to match the expected form. - Between 1962 and 1963 the `id` **"Eltanin"** is added to `dck` 897, which contains only data from that ship and has a missing `id`. - Between 1957 and 1961 a small number of `id`'s from `dck` 902 are reformatted to match expected format. This is done by prepending a No. "3" to the truncated `id`'s. - Between 1930 and 1961 `id`'s for `dck` 118 and 119 (small number of `id`'s) are reformatted to match the expected format. This is done by inserting a 2-digit year. - For `dck` 720 and `sid` 135, 8-character `id`s represent a single report. These are truncated to the first 4 digits, and a **"-SEQ"** is appended. **Homogenisation** --------------------- The following corrections are made in [`homog_ids.R`](https://git.noc.ac.uk/brecinosrivas/icoads.utils/-/blob/master/R/homog_ids.R) - `id`'s in `dck`'s 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are removed where needed. - Some ship `id`'s are callsigns and can be used to link to metadata information in WMO Publication No. 47. However some callsigns have been modified in some `dck`, or corrupted, so processing attempts to recover the original callsign in these cases are made. For more information see [Kent. *et al.,* (2007)](https://doi.org/10.1175/JTECH1949.1) and [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775). - Where an `id` is identified as a callsign or as an identifier listed in WMO Publication No. 47, other `id`'s containing the same character string are flagged and leading digits are removed. This is with the purpose of homogenising the callsigns across `dck`'s.