Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Matching criteria

Matching criteria · Changes

Page history
added rest of docs authored May 26, 2020 by bearecinos's avatar bearecinos
Hide whitespace changes
Inline Side-by-side
Showing with 83 additions and 1 deletion
+83 -1
  • Workflow/Matching-criteria.md Workflow/Matching-criteria.md +83 -1
  • No files found.
Workflow/Matching-criteria.md
View page @ 97fa7715
Coming soon
\ No newline at end of file
A flag indicating whether an `id` match is allow, is added to the data frames by
(see [link to script to be inserted]()).
Generic `id`'s (e.g. blank, "SHIP",
"MASKSTID") are allowed to match within a `dck`.
**Match criteria**
----------------
The following table contains the information used to decide whether `id`'s in a pair, are an allowed match.
These criteria have been developed by inspection of the paired `id`'s and are therefore likely to be approximate.
[Damerau–Levenshtein (DL) distance](https://en.wikipedia.org/wiki/Damerau%E2%80%93Levenshtein_distance) is the
number of insertions, deletions and swaps necessary to convert one string to another.
A substring is where one `id` is contained within the other.
*Italics* in the table below represents the "`id` type".
Matches of reports where the `id`'s meet the criteria listed below are allow.
________________
DCK | ID
:----- |:------------
Within any `dck` | blank, SHIP, MASKSTID
116, 117, 218 | any `id` to blank
150, 151, 152, 155, 156, 192, 193, 215, 720, 901 | any `id` to blank
128, 254, 720 | any `id` to blank
187, 196, 197, 229, 230, 720, 732 | any `id` to blank
227, 246, 732 | any `id` to blank
761, 898 | any `id` to blank
204, 245 | any `id` to blank
230, 254 | any `id` to blank
128, 230 | any `id` to blank
195, 281 | any `id` to blank
192, 193, 194, 201, 202, 706, 732 | any `id` to blank
194, 201, 202, 203, 207, 221, 223, 227, 233, 239, 254, 926 | substring; <br><br> or 1 digit `dck` 194 `id`
254, 926 | 2-5 characters of 254 match 3-6 characters of 926
194, 927 | "7- " in 194 with "00" in 227
194 with 207 or 227 | 3-6 characters of 194 match 2-5 characters of 207 or 227
194 with 194, 201, 202, 203, 207, 227 | DL = 1 or substring of length at least 3 and number of occurrences of one of the `id`'s = 1
194 with 201, 203, 207, 227 | substring
194, 201 | DL &lt;= 2 and one of the `id`'s classed as invalid
184, 209 | characters 5-8 of 184 with 2-5 or 209
555, 733 | add "N" at start of 555 <br><br> match of characters 2-4 <br><br> 555 `id` is SHIP
733 with 849, 888 | 849, 888 `id` is SHIP
733 with 888, 892 | 888, 892 `id` has DL &lt;= 1 with ROBB <br><br> 888, 892 `id` is EMIO, UYAJ, UFRE
186, 733 | 186 has 4 digit `id`, 733 is *north_pole_station*
750, 888 | 888 `id` is SHIP
781 with 128, 735, 849, 888, 926, 927| 781 is AAAA with callsign
927 with 230, 720 | 927 `id` is SHIP
213, 902 | 213 is characters 4-8 of 902
926 with 888, 892 | 888, 892 is characters 4-8 of 926
892, 926 | 892 is characters 1-4 of 926
117 | any match to *invalid* `id`
116, 117 | *id_over_X* to *id_minus*, match of characters 2-4 <br><br> characters 3-4 of 116 with 117 and 116 is *osv_onstation* <br><br> match of characters 1-3 and 116 is *osv_onstation* <br><br> match characters 2-3 of 116 with 1-2 of 117 and 116 is *osv_noship* <br><br> match of characters 1-4 and 116 is *other* <br><br> match of - at start of 116 with 5 at start of 117 <br><br> prepend 5 to 117 <br><br> prepend - to 116 <br><br> 1 digit `id` in each <br><br> match of start of 116 with 2 character 117 <br><br> within or between 116 and 117 when DL &lt;=2 when one `id` has 3 or fewer occurrences <br><br> 116 missing to extant 117 <br><br> 116 is osv_onstation and characters 3-4 are 00 with 117 `id` of length 4 <br><br> substring, one `id` has &lt;= 4 occurrences, the other &gt;= 10 occurrences
117 | prepend "-" to one of the `id`'s <br><br> DL = 1 if 3 or fewer occurrences of one `id`
116, 116| 22014, 22004
116, 226| *osv_noship* to *ows_logbook*
117, 218| prepend 0 to 117 and 218 is *us_ows_folio* <br><br> characters 1-3 of 117 with 2-4 of 218 is *us_ows_folio*
117, 128 | both 4 characters in length and match of characters 1-2 in 117 with 3-4 in 128
192, 215, 720 | match blank `id` <br><br> match characters 1-4 with 4 character `id` <br><br> allow letter as 5th character in 192 in 8 character `id` <br><br> one `id` *invalid* and not *id_5digit_pership* and not containing 0000 and DL&lt;=2 or substring <br><br> DL&lt;=2 and one `id` has &lt;= 3 occurrences and other has &gt;= 8
192, 215, 254, 720 | one is 5 character `id`, the other is not
246 | "PQP PTMNI" to "PORQUOIP"
762 | 2617A to 26174
128, 233, 254, 255, 555, 700, 708, 709, 732, 735, 749, 781, 792 ,849, 874, 875, 888, 889, 892, 926, 927, 992, 993, 995, 999hereafter "call.dcks" | subset<br><br> one is invalid and DL &lt;= 2 <br><br> DL &lt;= 2 and one has a single occurrence and the other at least 3 <br><br> one has a single occurrence and the other at least 20
call.dcks, 850 | SHIP, MASKSTID or AAAA to anything
call.dcks, 896 | SHIP to OWS
128, 555 | 128 platform type = 3 and 555 `id` starts 4Y
128, 230 with 555| ship number to call sign if at least 3 occurrences
128, 230 with 555, 720| matches with blank `id`
Any| match when 0 replaced with O; 0/O <br><br> I/J <br><br> UU/VV <br><br> U/V with DL=1 <br><br> WZC/WCZ
892, 896| replace C7O/C7 <br><br> MQR/C7R
992| replace XP42/MP42
700 with 792, 992| BBXX removed from start of `id`
711, 201| &gt; 3 occurrences
720, 734| &gt; 3 occurrences
246, 720| TERRANOVA to `id` starting 610426
193 with 705, 706, 707 | 705, 706, 707 starting NL or DN
118, 762 with 705, 706, 707 | 705, 706, 707 starting JP
203 with 705, 706, 707 | 705, 706, 707 starting UK
705, 706, 707 | with matching characters 1-2 of original `id`
703, 927| 927 `id` starts 05 and has &gt;=5 occurrences
\ No newline at end of file
Clone repository
  • API Reference
  • Examples
  • Home
  • How to install
  • Introduction
  • JASMIN tips
  • Releases
  • Workflow
  • Workflow
    • Data selection
    • Duplicate indentification
    • Matching criteria
    • Processing of IDs
    • Quality control