A flag indicating whether an id
match is allowed is added to each report by add_match_id.R
. Generic id
's (e.g. blank, "SHIP", "MASKSTID") are allowed to match within a dck
. Table 8 of the technical report contains the information used to decide whether id
's in a pair are allowed to match.
These criteria have been developed by inspection of the
paired id
's and are therefore likely to be approximate.
Damerau–Levenshtein (DL) distance is the number of insertions, deletions and swaps necessary to convert one string to another (Van der Loo M, 2014). A substring is where one ID is contained within the other. Italics represents the “id
type”.
References
Van der Loo M (2014). The stringdist package for approximate string matching. The R Journal, 6, 111-122.
https://CRAN.R-project.org/package=stringdist.