The pre-processing tasks described here are done in process_ids.R
Corrections
The following corrections to ship names are done in add_shipnames.R
.
-
For the period 1878 to 1894 some minor changes to ship names from
dck
704 are made to correct for typos and other similar problems. -
For
dck
701 (1867-1899), and 711 (1889-1899) some ship names are correct. -
For the period 1663 to 1860 CLIWOC logbook (needs link)
id
's fromdck
730 are convert to ship names using information from the project (https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm) This link does not work. -
For the period 1663 to 1863 ship names from the US Maury collection
dck
701 are extended using information from this link. Also data with a missingid
fromdck
701 is split into voyages by manual inspection. -
Ship names from the German Maury collection (
dck
721) are extent where they overlap with names from US Maury (dck
701). Where names are the same acrossdck
701 and 721 and it is not clear if the ships are the same, thedck
number is then also append (AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON). -
In
dck
555 (1966-1973) North Pole and South Pole stationid
's are correct by prepending a "N" or "S" depending on latitude.
Reformatting
-
Manual corrections to two
id
's fromdck
187 (1946-1956) are made to conform the expected format. -
For the period 1953 to 1961
id
's fromdck
184 are truncated to remove the first digit which indicates the ocean region, and theid
's are reformat to match the expected form. -
Between 1962 and 1963 the
id
"Eltanin" is added todck
897, which contains only data from that ship and has a missingid
. -
Between 1957 and 1961 a small number of
id
's fromdck
902 are reformat to match expected format. This is done by prepending a No. "3" to the truncatedid
's. -
Between 1930 and 1961
id
's fordck
118 and 119 (a small number ofid
's) are reformat to match the expected format. This is done by inserting a 2-digit year. -
For
dck
720 andsid
135, 8-characterid
s represent a single report. These are truncated to the first 4 digits, and a "-SEQ" is append.
Homogenisation
The following corrections are done in new_homog_ids.R
-
id
's indck
's 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are remove where needed. -
Some ship
id
's that are callsigns and need to be reformat to enable linking of data from the same ship acrossdck
's and in order to be linked to metadata information in WMO Publication No. 47 (Pub. 47, Kent et al. 2007, Freeman et al. 2011). -
Where an
id
is identify as a callsign or to a different identifier listed in Pub. 47 otherid
's containing the same character string are flag and leading digits are remove. This with the purpose of homogenising the callsigns acrossdck
's.
Linking
This is not part of process_ids.R. Should we have a special page for the processing done by new_merge_ids_year.R
. I still need some explanation from Lizz to do this.