Processing of IDs

This is an old version of this page. You can view the most recent version or browse the history.

The pre-processing tasks described here are done in process_ids.R

Corrections

The following corrections to ship names are done in add_shipnames.R.

For the period 1878 to 1894 some minor changes to ship names from dck 704 are made to correct for typos and other similar problems.
For dck 701 (1867-1899), and 711 (1889-1899) some ship names are correct.
For the period 1663 to 1860 CLIWOC logbook (needs link) id's from dck 730 are convert to ship names using information from the project (https://projects.knmi.nl/cliwoc/download/shiplogbookid21.htm) This link does not work.
For the period 1663 to 1863 ship names from the US Maury collection dck 701 are extended using information from this link. Also data with a missing id from dck 701 is split into voyages by manual inspection.
Ship names from the German Maury collection (dck 721) are extent where they overlap with names from US Maury (dck 701). Where names are the same across dck 701 and 721 and it is not clear if the ships are the same, the dck number is then also append (AUSTRALIA, JAMESTOWN, SWORDFISH, ANN MARIA, ASHBURTON).
In dck 555 (1966-1973) North Pole and South Pole station id's are correct by prepending a "N" or "S" depending on latitude.

Reformatting

Manual corrections to two id's from dck 187 (1946-1956) are made to conform the expected format.
For the period 1953 to 1961 id's from dck 184 are truncated to remove the first digit which indicates the ocean region, and the id's are reformat to match the expected form.
Between 1962 and 1963 the id "Eltanin" is added to dck 897, which contains only data from that ship and has a missing id.
Between 1957 and 1961 a small number of id's from dck 902 are reformat to match expected format. This is done by prepending a No. "3" to the truncated id's.
Between 1930 and 1961 id's for dck 118 and 119 (a small number of id's) are reformat to match the expected format. This is done by inserting a 2-digit year.
For dck 720 and sid 135, 8-character ids represent a single report. These are truncated to the first 4 digits, and a "-SEQ" is append.

Homogenisation

The following corrections are done in new_homog_ids.R

id's in dck's 194, 201, 202, 203 and 227 are all derived from the same 5-digit ship identifiers. Leading digits are remove where needed.
Some ship id's that are callsigns and need to be reformat to enable linking of data from the same ship across dck's and in order to be linked to metadata information in WMO Publication No. 47 (Pub. 47, Kent et al. 2007, Freeman et al. 2011).
Where an id is identify as a callsign or to a different identifier listed in Pub. 47 other id's containing the same character string are flag and leading digits are remove. This with the purpose of homogenising the callsigns across dck's.

Linking

This is not part of process_ids.R. Should we have a special page for the processing done by new_merge_ids_year.R. I still need some explanation from Lizz to do this.