Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Matching criteria

Last edited by Beatriz Recinos Sep 15, 2020
Page history
This is an old version of this page. You can view the most recent version or browse the history.

Matching criteria

A flag indicating whether an id match is allow, is added to the data frames by (see link to script to be inserted).

Generic id's (e.g. blank, "SHIP", "MASKSTID") are allowed to match within a dck.

Match criteria

The following table contains the information used to decide whether id's in a pair, are an allowed match.

These criteria have been developed by inspection of the paired id's and are therefore likely to be approximate.

Damerau–Levenshtein (DL) distance is the number of insertions, deletions and swaps necessary to convert one string to another.

A substring is where one id is contained within the other.

Italics in the table below represents the "id type".

Matches of reports where the id's meet the criteria listed below are allow.


DCK ID
Within any dck blank, SHIP, MASKSTID
116, 117, 218 any id to blank
150, 151, 152, 155, 156, 192, 193, 215, 720, 901 any id to blank
128, 254, 720 any id to blank
187, 196, 197, 229, 230, 720, 732 any id to blank
227, 246, 732 any id to blank
761, 898 any id to blank
204, 245 any id to blank
230, 254 any id to blank
128, 230 any id to blank
195, 281 any id to blank
192, 193, 194, 201, 202, 706, 732 any id to blank
194, 201, 202, 203, 207, 221, 223, 227, 233, 239, 254, 926 substring;

or 1 digit dck 194 id
254, 926 2-5 characters of 254 match 3-6 characters of 926
194, 927 "7- " in 194 with "00" in 227
194 with 207 or 227 3-6 characters of 194 match 2-5 characters of 207 or 227
194 with 194, 201, 202, 203, 207, 227 DL = 1 or substring of length at least 3 and number of occurrences of one of the id's = 1
194 with 201, 203, 207, 227 substring
194, 201 DL <= 2 and one of the id's classed as invalid
184, 209 characters 5-8 of 184 with 2-5 or 209
555, 733 add "N" at start of 555

match of characters 2-4

555 id is SHIP
733 with 849, 888 849, 888 id is SHIP
733 with 888, 892 888, 892 id has DL <= 1 with ROBB

888, 892 id is EMIO, UYAJ, UFRE
186, 733 186 has 4 digit id, 733 is north_pole_station
750, 888 888 id is SHIP
781 with 128, 735, 849, 888, 926, 927 781 is AAAA with callsign
927 with 230, 720 927 id is SHIP
213, 902 213 is characters 4-8 of 902
926 with 888, 892 888, 892 is characters 4-8 of 926
892, 926 892 is characters 1-4 of 926
117 any match to invalid id
116, 117 id_over_X to id_minus, match of characters 2-4

characters 3-4 of 116 with 117 and 116 is osv_onstation

match of characters 1-3 and 116 is osv_onstation

match characters 2-3 of 116 with 1-2 of 117 and 116 is osv_noship

match of characters 1-4 and 116 is other

match of - at start of 116 with 5 at start of 117

prepend 5 to 117

prepend - to 116

1 digit id in each

match of start of 116 with 2 character 117

within or between 116 and 117 when DL <=2 when one id has 3 or fewer occurrences

116 missing to extant 117

116 is osv_onstation and characters 3-4 are 00 with 117 id of length 4

substring, one id has <= 4 occurrences, the other >= 10 occurrences
117 prepend "-" to one of the id's

DL = 1 if 3 or fewer occurrences of one id
116, 116 22014, 22004
116, 226 osv_noship to ows_logbook
117, 218 prepend 0 to 117 and 218 is us_ows_folio

characters 1-3 of 117 with 2-4 of 218 is us_ows_folio
117, 128 both 4 characters in length and match of characters 1-2 in 117 with 3-4 in 128
192, 215, 720 match blank id

match characters 1-4 with 4 character id

allow letter as 5th character in 192 in 8 character id

one id invalid and not id_5digit_pership and not containing 0000 and DL<=2 or substring

DL<=2 and one id has <= 3 occurrences and other has >= 8
192, 215, 254, 720 one is 5 character id, the other is not
246 "PQP PTMNI" to "PORQUOIP"
762 2617A to 26174
128, 233, 254, 255, 555, 700, 708, 709, 732, 735, 749, 781, 792 ,849, 874, 875, 888, 889, 892, 926, 927, 992, 993, 995, 999hereafter "call.dcks" subset

one is invalid and DL <= 2

DL <= 2 and one has a single occurrence and the other at least 3

one has a single occurrence and the other at least 20
call.dcks, 850 SHIP, MASKSTID or AAAA to anything
call.dcks, 896 SHIP to OWS
128, 555 128 platform type = 3 and 555 id starts 4Y
128, 230 with 555 ship number to call sign if at least 3 occurrences
128, 230 with 555, 720 matches with blank id
Any match when 0 replaced with O; 0/O

I/J

UU/VV

U/V with DL=1

WZC/WCZ
892, 896 replace C7O/C7

MQR/C7R
992 replace XP42/MP42
700 with 792, 992 BBXX removed from start of id
711, 201 > 3 occurrences
720, 734 > 3 occurrences
246, 720 TERRANOVA to id starting 610426
193 with 705, 706, 707 705, 706, 707 starting NL or DN
118, 762 with 705, 706, 707 705, 706, 707 starting JP
203 with 705, 706, 707 705, 706, 707 starting UK
705, 706, 707 with matching characters 1-2 of original id
703, 927 927 id starts 05 and has >=5 occurrences
Clone repository
  • API Reference
  • Examples
  • Home
  • How to install
  • Introduction
  • JASMIN tips
  • Releases
  • Workflow
  • Workflow
    • Data selection
    • Duplicate indentification
    • Matching criteria
    • Processing of IDs
    • Quality control