... | ... | @@ -14,7 +14,7 @@ The identification and paring of duplicates happens in the following stages of t |
|
|
First stage
|
|
|
-----------
|
|
|
The first stage identifies duplicate records between the **ship data** and data taken by **different platform types** (e.g DRIFT, PLAT). This is done in [`simple_dup.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/simple_dup.R). The code considers the records as duplicates if they show a full match in date, time and position.
|
|
|
|
|
|
|
|
|
Second stage
|
|
|
------------
|
|
|
The second stage identifies duplicate records within the ship data. Pairs the reports as duplicate if they have associated ship `id`'s. The candidate pairs are selected according to i) the number of matching elements (similar content of variables within a specific tolerance), ii) the `dck`'s, and iii) a comparison of the `id`'s.
|
... | ... | |