|
A flag indicating whether an `id` match is allowed is added to each report by [INSERT-THE-LINK-TO-SCRIPT](). Generic `id`'s (e.g. blank, "SHIP", "MASKSTID") are allowed to match within a `dck`. The table below contains the information used to decide whether `id`'s in a pair are an allowed match. *Italics* in the table below represents the *“`id` type”*.
|
|
A flag indicating whether an `id` match is allowed is added to each report by [`new_add_match_id.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rutils/new_add_match_id.R). Generic `id`'s (e.g. blank, "SHIP", "MASKSTID") are allowed to match within a `dck`. Table 8 of the [technical document]() contains the information used to decide whether `id`'s in a pair are allowed to match.
|
|
|
|
|
|
________________
|
|
These criteria have been developed by inspection of the
|
|
DCK | ID
|
|
paired `id`'s and are therefore likely to be approximate.
|
|
:----- |:------------
|
|
|
|
Within any `dck` | blank, SHIP, MASKSTID
|
|
|
|
116, 117, 218 | any `id` to blank
|
|
Damerau–Levenshtein (DL) distance is the number of insertions, deletions and swaps necessary to convert one string to another ([Van der Loo M, 2014](https://journal.r-project.org/archive/2014-1/loo.pdf)). A substring is where one ID is contained within the other. Italics represents the “`id` type”.
|
|
150, 151, 152, 155, 156, 192, 193, 215, 720, 901 | any `id` to blank
|
|
|
|
128, 254, 720 | any `id` to blank
|
|
|
|
187, 196, 197, 229, 230, 720, 732 | any `id` to blank
|
|
References
|
|
227, 246, 732 | any `id` to blank
|
|
-----
|
|
761, 898 | any `id` to blank
|
|
[Van der Loo M (2014)](https://journal.r-project.org/archive/2014-1/loo.pdf). The stringdist package for approximate string matching. The R Journal, 6, 111-122. https://CRAN.R-project.org/package=stringdist. |
|
204, 245 | any `id` to blank
|
|
\ No newline at end of file |
|
230, 254 | any `id` to blank
|
|
|
|
128, 230 | any `id` to blank
|
|
|
|
195, 281 | any `id` to blank
|
|
|
|
192, 193, 194, 201, 202, 706, 732 | any `id` to blank
|
|
|
|
194, 201, 202, 203, 207, 221, 223, 227, 233, 239, 254, 926 | substring; <br><br> or 1 digit `dck` 194 `id`
|
|
|
|
254, 926 | 2-5 characters of 254 match 3-6 characters of 926
|
|
|
|
194, 927 | "7- " in 194 with "00" in 227
|
|
|
|
194 with 207 or 227 | 3-6 characters of 194 match 2-5 characters of 207 or 227
|
|
|
|
194 with 194, 201, 202, 203, 207, 227 | DL = 1 or substring of length at least 3 and number of occurrences of one of the `id`'s = 1
|
|
|
|
194 with 201, 203, 207, 227 | substring
|
|
|
|
194, 201 | DL <= 2 and one of the `id`'s classed as invalid
|
|
|
|
184, 209 | characters 5-8 of 184 with 2-5 or 209
|
|
|
|
555, 733 | add "N" at start of 555 <br><br> match of characters 2-4 <br><br> 555 `id` is SHIP
|
|
|
|
733 with 849, 888 | 849, 888 `id` is SHIP
|
|
|
|
733 with 888, 892 | 888, 892 `id` has DL <= 1 with ROBB <br><br> 888, 892 `id` is EMIO, UYAJ, UFRE
|
|
|
|
186, 733 | 186 has 4 digit `id`, 733 is *north_pole_station*
|
|
|
|
750, 888 | 888 `id` is SHIP
|
|
|
|
781 with 128, 735, 849, 888, 926, 927| 781 is AAAA with callsign
|
|
|
|
927 with 230, 720 | 927 `id` is SHIP
|
|
|
|
213, 902 | 213 is characters 4-8 of 902
|
|
|
|
926 with 888, 892 | 888, 892 is characters 4-8 of 926
|
|
|
|
892, 926 | 892 is characters 1-4 of 926
|
|
|
|
117 | any match to *invalid* `id`
|
|
|
|
116, 117 | *id_over_X* to *id_minus*, match of characters 2-4 <br><br> characters 3-4 of 116 with 117 and 116 is *osv_onstation* <br><br> match of characters 1-3 and 116 is *osv_onstation* <br><br> match characters 2-3 of 116 with 1-2 of 117 and 116 is *osv_noship* <br><br> match of characters 1-4 and 116 is *other* <br><br> match of - at start of 116 with 5 at start of 117 <br><br> prepend 5 to 117 <br><br> prepend - to 116 <br><br> 1 digit `id` in each <br><br> match of start of 116 with 2 character 117 <br><br> within or between 116 and 117 when DL <=2 when one `id` has 3 or fewer occurrences <br><br> 116 missing to extant 117 <br><br> 116 is osv_onstation and characters 3-4 are 00 with 117 `id` of length 4 <br><br> substring, one `id` has <= 4 occurrences, the other >= 10 occurrences
|
|
|
|
117 | prepend "-" to one of the `id`'s <br><br> DL = 1 if 3 or fewer occurrences of one `id`
|
|
|
|
116, 116| 22014, 22004
|
|
|
|
116, 226| *osv_noship* to *ows_logbook*
|
|
|
|
117, 218| prepend 0 to 117 and 218 is *us_ows_folio* <br><br> characters 1-3 of 117 with 2-4 of 218 is *us_ows_folio*
|
|
|
|
117, 128 | both 4 characters in length and match of characters 1-2 in 117 with 3-4 in 128
|
|
|
|
192, 215, 720 | match blank `id` <br><br> match characters 1-4 with 4 character `id` <br><br> allow letter as 5th character in 192 in 8 character `id` <br><br> one `id` *invalid* and not *id_5digit_pership* and not containing 0000 and DL<=2 or substring <br><br> DL<=2 and one `id` has <= 3 occurrences and other has >= 8
|
|
|
|
192, 215, 254, 720 | one is 5 character `id`, the other is not
|
|
|
|
246 | "PQP PTMNI" to "PORQUOIP"
|
|
|
|
762 | 2617A to 26174
|
|
|
|
128, 233, 254, 255, 555, 700, 708, 709, 732, 735, 749, 781, 792 ,849, 874, 875, 888, 889, 892, 926, 927, 992, 993, 995, 999hereafter "call.dcks" | subset<br><br> one is invalid and DL <= 2 <br><br> DL <= 2 and one has a single occurrence and the other at least 3 <br><br> one has a single occurrence and the other at least 20
|
|
|
|
call.dcks, 850 | SHIP, MASKSTID or AAAA to anything
|
|
|
|
call.dcks, 896 | SHIP to OWS
|
|
|
|
128, 555 | 128 platform type = 3 and 555 `id` starts 4Y
|
|
|
|
128, 230 with 555| ship number to call sign if at least 3 occurrences
|
|
|
|
128, 230 with 555, 720| matches with blank `id`
|
|
|
|
Any| match when 0 replaced with O; 0/O <br><br> I/J <br><br> UU/VV <br><br> U/V with DL=1 <br><br> WZC/WCZ
|
|
|
|
892, 896| replace C7O/C7 <br><br> MQR/C7R
|
|
|
|
992| replace XP42/MP42
|
|
|
|
700 with 792, 992| BBXX removed from start of `id`
|
|
|
|
711, 201| > 3 occurrences
|
|
|
|
720, 734| > 3 occurrences
|
|
|
|
246, 720| TERRANOVA to `id` starting 610426
|
|
|
|
193 with 705, 706, 707 | 705, 706, 707 starting NL or DN
|
|
|
|
118, 762 with 705, 706, 707 | 705, 706, 707 starting JP
|
|
|
|
203 with 705, 706, 707 | 705, 706, 707 starting UK
|
|
|
|
705, 706, 707 | with matching characters 1-2 of original `id`
|
|
|
|
703, 927| 927 `id` starts 05 and has >=5 occurrences |
|
|
|
\ No newline at end of file |
|
|