Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Duplicate indentification

Duplicate indentification · Changes

Page history
update to api and links to icoads.utils authored Sep 15, 2020 by Beatriz Recinos's avatar Beatriz Recinos
Hide whitespace changes
Inline Side-by-side
Showing with 4 additions and 4 deletions
+4 -4
  • Workflow/Duplicate-indentification.md Workflow/Duplicate-indentification.md +4 -4
  • No files found.
Workflow/Duplicate-indentification.md
View page @ 0da6c8b6
......@@ -5,11 +5,11 @@ The identification and paring of duplicates happens in the following stages of t
1. [`simple_dup.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/simple_dup.R)
2. [`new_get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_pairs.R)
2. [`get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_pairs.R)
3. [`new_get_dups.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_dups.R)
3. [`get_dups.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_dups.R)
4. [`new_merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_merge_ids_year.R)
4. [`merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/merge_ids_year.R)
First stage
-----------
......@@ -19,7 +19,7 @@ Second stage
------------
The second stage identifies duplicate records within the ship data. Pairs the reports as duplicate if they have associated ship `id`'s. The candidate pairs are selected according to i) the number of matching elements (similar content of variables within a specific tolerance), ii) the `dck`'s, and iii) a comparison of the `id`'s.
For more information regarding the selection criteria to consider records as a pair of duplicated information in [`new_get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_pairs.R) see Table 7 and 8 of the [technical report](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/docs/C3S_D311a_Lot2.dup_doc_v3.pdf).
For more information regarding the selection criteria to consider records as a pair of duplicated information in [`get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_pairs.R) see Table 7 and 8 of the [technical report](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/docs/C3S_D311a_Lot2.dup_doc_v3.pdf).
Third stage
-----------
......
Clone repository

Wiki pages

Home

Introduction
Installation
JASMIN tips

Workflow
- Data selection
- Processing of ID's
- Matching criteria
- Quality control
- Duplicate identification

API Reference

Releases

Examples