Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
    • Workflow
  • Duplicate indentification

Duplicate indentification · Changes

Page history
added track check part authored Jun 09, 2020 by bearecinos's avatar bearecinos
Show whitespace changes
Inline Side-by-side
Showing with 4 additions and 2 deletions
+4 -2
  • Workflow/Duplicate-indentification.md Workflow/Duplicate-indentification.md +4 -2
  • No files found.
Workflow/Duplicate-indentification.md
View page @ f5710e90
......@@ -23,9 +23,11 @@ Second stage
Third stage
-----------
At this stage we are able to count the number of duplicated records and flag the best according to a [quality control criteria](Workflow/quality-control). The duplicate pairs are also combine into groups. Each group of possible duplicates is then assessed for quality control. This process it is important to account for known differences between `dck`'s that are not captured in the precision information of previous processing stages.
At this stage we are able to count the number of duplicated records and flag the best according to a [quality control criteria](Workflow/quality-control). The duplicate pairs are also combine into groups. Each group of possible duplicates is then assessed for quality control. This process is important to account for known differences between `dck`'s that are not captured in the precision information of previous processing stages.
Four stage
-----------
Once the date/time/location parameter value duplicates have been identified and flagged, the next stage in the processing considers together the data that have associated `id`'s. Sometimes the link between `id`'s can be used to homogenise the `id`'s beyond the individual pairs, sometimes the link is
specific to a particular pair of reports, particularly if one of the matched `id`'s is generic. `id` matches are therefore only considered within-group. At the end of the processing the suffix “_gN” is appended to the `id`'s, where N is the group number. More information on the group assignments by `dck` and `id` can be found in the thecnical report (C3S_D311a_Lot2.dup_doc_v3.docx table 15 and 16).
The linked `id`'s are then checked using the [MOQC track check](Workflow/Quality-control#met-office-track-check), and for time duplicates. Reports that fail the track check are flagged as a worst duplicate. Where positions are similar the best duplicate is selected by `dck` priority and number of elements with similar content of variables.
\ No newline at end of file
Clone repository
  • API Reference
  • Examples
  • Home
  • How to install
  • Introduction
  • JASMIN tips
  • Releases
  • Workflow
  • Workflow
    • Data selection
    • Duplicate indentification
    • Matching criteria
    • Processing of IDs
    • Quality control