Pre-processing tasks described here are done in process_ids.R
Corrections
The following corrections to ship names happen in add_shipnames.R.
-
For the period 1878 to 1894 some minor changes to ship names from
dck704 are made to correct for typos and other similar problems. -
For
dck701 (1867-1899) anddck711 (1889-1899) some ship names are corrected. -
For the period 1663 to 1860, CLIWOC logbook (needs link)
id's fromdck730 are converted to ship names using information from the project:
https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm This link does not work. -
For the period 1663 to 1863 ship names from the US Maury collection
dck701 are extended using information from this link. Also data with a missingidfromdck701 is split into voyages by manual inspection. -
Ship names from the German Maury collection (
dck721) are extented where they overlap with names from US Maury (dck701). Where names are the same acrossdck701 anddck721, and it is not clear if the ships are the same, thedcknumber is then also append
(e.g. AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON). -
In
dck555 (1966-1973) North Pole and South Pole stationid's are corrected by prepending a "N" or "S" depending on latitude.
Reformatting
-
Manual corrections to two
id's fromdck187 (1946-1956) are made to conform the expected format. -
For the period 1953 to 1961
id's fromdck184 are truncated to remove the first digit which indicates the ocean region, and theid's are reformatted to match the expected form. -
Between 1962 and 1963 the
id"Eltanin" is added todck897, which contains only data from that ship and has a missingid. -
Between 1957 and 1961 a small number of
id's fromdck902 are reformatted to match expected format. This is done by prepending a No. "3" to the truncatedid's. -
Between 1930 and 1961
id's fordck118 and 119 (small number ofid's) are reformatted to match the expected format. This is done by inserting a 2-digit year. -
For
dck720 andsid135, 8-characterids represent a single report. These are truncated to the first 4 digits, and a "-SEQ" is appended.
Homogenisation
The following corrections are made in new_homog_ids.R
-
id's indck's 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are removed where needed. -
Some ship
id's are callsigns and need to be reformat to enable linking of data from the same ship acrossdck's and to be linked to metadata information in WMO Publication No. 47. For more information see Kent. et al., (2007) and Freeman. et al., (2017). -
Where an
idis identified as a callsign or an identifier listed in WMO Publication No. 47, otherid's containing the same character string are flagged and leading digits are removed. This is with the purpose of homogenising the callsigns acrossdck's.