|
|
This repository is a collection of R-scripts to homogenise platform
|
|
|
identifier information and to identify duplicate observations in the
|
|
|
**International Comprehensive Ocean-Atmosphere Data Set** (ICOADS) marine data source. **BOLD** denotes an ICOADS variable name.
|
|
|
**International Comprehensive Ocean-Atmosphere Data Set** (ICOADS) marine data source. Text in this `format` denotes an ICOADS variable name (see [API-reference](api-reference) for variables information).
|
|
|
|
|
|
ICOADS is the world most extensive surface marine meteorological data collection.
|
|
|
Contains ocean surface and atmospheric observations from the late 1600's
|
|
|
to present and is updated every month with observations from near-real-time data streams.
|
|
|
The data base is made up of observation reports from many different sources,
|
|
|
there are several hundred combinations of the **DCK** (deck) and **SID** (sources)
|
|
|
there are several hundred combinations of the `dck` (deck) and `sid` (sources)
|
|
|
flags that indicate the origin of the data.
|
|
|
Typically, **DCK** indicates the type of data
|
|
|
(e.g. US Navy ships; Japanese Whaling Fleet) and **SID** provides more information
|
|
|
Typically, `dck` indicates the type of data
|
|
|
(e.g. US Navy ships; Japanese Whaling Fleet) and `sid` provides more information
|
|
|
about the data system or format
|
|
|
(e.g. data stream extracted from the WMO global telecommunications systems, GIS).
|
|
|
|
|
|
Sometimes a single **DCK** is associated with a single **SID**,
|
|
|
sometimes a single **DCK** will contain several **SID** and vice versa,
|
|
|
not all of the **DCK** and **SID** are independent so there can be duplicated reports which need to be identified and flagged.
|
|
|
Sometimes a single `dck` is associated with a single `sid`,
|
|
|
sometimes a single `dck` will contain several `sid` and vice versa,
|
|
|
not all of the `dck` and `sid` are independent so there can be duplicated reports which need to be identified and flagged.
|
|
|
|
|
|
Historically archives of marine data have been maintained by individual nations,
|
|
|
and often these were shared so that the same observations appear in the archives
|
... | ... | @@ -23,7 +23,7 @@ of several nations. Truncated formats often did not contain sufficient informati |
|
|
to identify the observations made by a particular ship or platform,
|
|
|
and these compact formats sometimes converted or encoded data in different ways.
|
|
|
For example, many observations do not have an identifier linking to the ship
|
|
|
(**ID**) or platform (**PT**), and for those that do have such identifiers
|
|
|
(`id`) or platform (`pt`), and for those that do have such identifiers
|
|
|
they may be different between data sources. The main types of duplicates are:
|
|
|
|
|
|
* Observations historically shared among national archives,
|
... | ... | @@ -39,7 +39,7 @@ likely to have different formats, precision, conversions and metadata. |
|
|
|
|
|
The processing software used by ICOADS (https://icoads.noaa.gov/software/) is written in FORTRAN and includes code to translate software to the IMMA1 format [Smith. *et al.,* (2016)](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1_short.pdf), to apply QC and flags, and to identify (and in earlier releases remove) reports likely to be duplicates [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
|
|
|
|
|
|
The code in this repository offers additional quality control on the data, homogenisation of ID information between different **DCK** and **SID** and duplicate identification (DI) preserving information on reports associated by the DI through the use of ICOADS unique identifiers (**UID**).
|
|
|
The code in this repository offers additional quality control on the data, homogenisation of ID information between different `dck` and `sid` and duplicate identification (DI) preserving information on reports associated by the DI through the use of ICOADS unique identifiers (`uid`).
|
|
|
|
|
|
References
|
|
|
----------
|
... | ... | |