Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Matching criteria

Last edited by Beatriz Recinos 4 years ago
Page history

Matching criteria

A flag indicating whether an id match is allowed is added to each report by add_match_id.R. Generic id's (e.g. blank, "SHIP", "MASKSTID") are allowed to match within a dck. Table 8 of the technical report contains the information used to decide whether id's in a pair are allowed to match.

These criteria have been developed by inspection of the paired id's and are therefore likely to be approximate.

Damerau–Levenshtein (DL) distance is the number of insertions, deletions and swaps necessary to convert one string to another (Van der Loo M, 2014). A substring is where one ID is contained within the other. Italics represents the “id type”.

References

Van der Loo M (2014). The stringdist package for approximate string matching. The R Journal, 6, 111-122.
https://CRAN.R-project.org/package=stringdist.

Clone repository

Wiki pages

Home

Introduction
Installation
JASMIN tips

Workflow
- Data selection
- Processing of ID's
- Matching criteria
- Quality control
- Duplicate identification

API Reference

Releases

Examples