Pre-processing tasks described here are done in process_ids.R
Corrections
The following corrections to ship names happen in add_shipnames.R
.
-
For the period 1878 to 1894 some minor changes to ship names from
dck
704 are made to correct for typos and other similar problems. -
For
dck
701 (1867-1899) anddck
711 (1889-1899) some ship names are corrected. -
For the period 1663 to 1860, CLIWOC logbook (needs link)
id
's fromdck
730 are converted to ship names using information from the project:
https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm This link does not work. -
For the period 1663 to 1863 ship names from the US Maury collection
dck
701 are extended using information from this link. Also data with a missingid
fromdck
701 is split into voyages by manual inspection. -
Ship names from the German Maury collection (
dck
721) are extented where they overlap with names from US Maury (dck
701). Where names are the same acrossdck
701 anddck
721, and it is not clear if the ships are the same, thedck
number is then also append
(e.g. AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON). -
In
dck
555 (1966-1973) North Pole and South Pole stationid
's are corrected by prepending a "N" or "S" depending on latitude.
Reformatting
-
Manual corrections to two
id
's fromdck
187 (1946-1956) are made to conform the expected format. -
For the period 1953 to 1961
id
's fromdck
184 are truncated to remove the first digit which indicates the ocean region, and theid
's are reformatted to match the expected form. -
Between 1962 and 1963 the
id
"Eltanin" is added todck
897, which contains only data from that ship and has a missingid
. -
Between 1957 and 1961 a small number of
id
's fromdck
902 are reformatted to match expected format. This is done by prepending a No. "3" to the truncatedid
's. -
Between 1930 and 1961
id
's fordck
118 and 119 (small number ofid
's) are reformatted to match the expected format. This is done by inserting a 2-digit year. -
For
dck
720 andsid
135, 8-characterid
s represent a single report. These are truncated to the first 4 digits, and a "-SEQ" is appended.
Homogenisation
The following corrections are made in new_homog_ids.R
-
id
's indck
's 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are removed where needed. -
Some ship
id
's are callsigns and need to be reformat to enable linking of data from the same ship acrossdck
's and to be linked to metadata information in WMO Publication No. 47. For more information see Kent. et al., (2007) and Freeman. et al., (2017). -
Where an
id
is identified as a callsign or an identifier listed in WMO Publication No. 47, otherid
's containing the same character string are flagged and leading digits are removed. This is with the purpose of homogenising the callsigns acrossdck
's.
Linking
This is not part of process_ids.R. Should we have a different page for the liking processing done by new_merge_ids_year.R
. I still need some help from Liz regarding this.