... | ... | @@ -23,7 +23,7 @@ of several nations. Truncated formats often did not contain sufficient informati |
|
|
to identify the observations made by a particular ship or platform,
|
|
|
and these compact formats sometimes converted or encoded data in different ways.
|
|
|
For example, many observations do not have an identifier linking to the ship
|
|
|
(**ID**) or platform (**pt**), and for those that do have such identifiers
|
|
|
(**ID**) or platform (**PT**), and for those that do have such identifiers
|
|
|
they may be different between data sources. The main types of duplicates are:
|
|
|
|
|
|
* Observations historically shared among national archives,
|
... | ... | @@ -37,10 +37,9 @@ likely to have different formats, precision, conversions and metadata. |
|
|
|
|
|
* Planned redundancy, for example the ingestion of several near real time data streams.
|
|
|
|
|
|
There is already a protocol and other tools written in Python to read and
|
|
|
perform some quality control on the data and to identify duplicate observations
|
|
|
as described in [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
|
|
|
However, the processing methods in this repository offer additional quality control
|
|
|
The processing software used by ICOADS (https://icoads.noaa.gov/software/) is written in FORTRAN and includes code to translate software to the IMMA1 format [Smith. *et al.,* (2016)], to apply QC and flags, and to identify (and in earlier releases remove) reports likely to be duplicates [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
|
|
|
|
|
|
The code in this repository offer additional quality control
|
|
|
on the data, duplicate identification and linking of IDs between each pair of duplicate reports.
|
|
|
It also provides an identification of the best duplicate by assessing the track
|
|
|
(path in lat/lon) of the observation.
|
... | ... | |