... | ... | @@ -23,9 +23,11 @@ Second stage |
|
|
|
|
|
Third stage
|
|
|
-----------
|
|
|
At this stage we are able to count the number of duplicated records and flag the best according to a [quality control criteria](Workflow/quality-control). The duplicate pairs are also combine into groups. Each group of possible duplicates is then assessed for quality control. This process it is important to account for known differences between `dck`'s that are not captured in the precision information of previous processing stages.
|
|
|
At this stage we are able to count the number of duplicated records and flag the best according to a [quality control criteria](Workflow/quality-control). The duplicate pairs are also combine into groups. Each group of possible duplicates is then assessed for quality control. This process is important to account for known differences between `dck`'s that are not captured in the precision information of previous processing stages.
|
|
|
|
|
|
Four stage
|
|
|
-----------
|
|
|
Once the date/time/location parameter value duplicates have been identified and flagged, the next stage in the processing considers together the data that have associated `id`'s. Sometimes the link between `id`'s can be used to homogenise the `id`'s beyond the individual pairs, sometimes the link is
|
|
|
specific to a particular pair of reports, particularly if one of the matched `id`'s is generic. `id` matches are therefore only considered within-group. At the end of the processing the suffix “_gN” is appended to the `id`'s, where N is the group number. More information on the group assignments by `dck` and `id` can be found in the thecnical report (C3S_D311a_Lot2.dup_doc_v3.docx table 15 and 16).
|
|
|
|
|
|
The linked `id`'s are then checked using the [MOQC track check](Workflow/Quality-control#met-office-track-check), and for time duplicates. Reports that fail the track check are flagged as a worst duplicate. Where positions are similar the best duplicate is selected by `dck` priority and number of elements with similar content of variables. |
|
|
\ No newline at end of file |