... | ... | @@ -5,11 +5,11 @@ The identification and paring of duplicates happens in the following stages of t |
|
|
|
|
|
1. [`simple_dup.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/simple_dup.R)
|
|
|
|
|
|
2. [`new_get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_pairs.R)
|
|
|
2. [`get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_pairs.R)
|
|
|
|
|
|
3. [`new_get_dups.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_dups.R)
|
|
|
3. [`get_dups.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_dups.R)
|
|
|
|
|
|
4. [`new_merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_merge_ids_year.R)
|
|
|
4. [`merge_ids_year.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/merge_ids_year.R)
|
|
|
|
|
|
First stage
|
|
|
-----------
|
... | ... | @@ -19,7 +19,7 @@ Second stage |
|
|
------------
|
|
|
The second stage identifies duplicate records within the ship data. Pairs the reports as duplicate if they have associated ship `id`'s. The candidate pairs are selected according to i) the number of matching elements (similar content of variables within a specific tolerance), ii) the `dck`'s, and iii) a comparison of the `id`'s.
|
|
|
|
|
|
For more information regarding the selection criteria to consider records as a pair of duplicated information in [`new_get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/new_get_pairs.R) see Table 7 and 8 of the [technical report](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/docs/C3S_D311a_Lot2.dup_doc_v3.pdf).
|
|
|
For more information regarding the selection criteria to consider records as a pair of duplicated information in [`get_pairs.R`](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/rscripts/get_pairs.R) see Table 7 and 8 of the [technical report](https://git.noc.ac.uk/brecinosrivas/icoads-r-hostace/-/blob/master/docs/C3S_D311a_Lot2.dup_doc_v3.pdf).
|
|
|
|
|
|
Third stage
|
|
|
-----------
|
... | ... | |