Skip to content

GitLab

  • Projects
  • Groups
  • Snippets
  • Help
    • Loading...
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
I ICOADS R HOSTACE
  • Project overview
    • Project overview
    • Details
    • Activity
    • Releases
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 7
    • Issues 7
    • List
    • Boards
    • Labels
    • Service Desk
    • Milestones
  • Merge requests 0
    • Merge requests 0
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
  • Operations
    • Operations
    • Incidents
    • Environments
  • Analytics
    • Analytics
    • CI/CD
    • Repository
    • Value Stream
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Members
    • Members
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • brivas
  • ICOADS R HOSTACE
  • Wiki
  • Introduction

Introduction · Changes

Page history
update in docs authored May 15, 2020 by bearecinos's avatar bearecinos
Show whitespace changes
Inline Side-by-side
Showing with 37 additions and 10 deletions
+37 -10
  • Introduction.md Introduction.md +37 -10
  • No files found.
Introduction.md
View page @ cc20d5ff
This repository consist in a collection of R-scripts to homogenise platform identifier information and to identify duplicate observations in the **International Comprehensive Ocean-Atmosphere Data Set** (ICOADS) marine data source. This repository consist in a collection of R-scripts to homogenise platform
identifier information and to identify duplicate observations in the
ICOADS is the world most extensive surface marine meteorological data collection. Contains ocean surface and atmospheric observations from the 1600's to present and is still receiving more data every year. The data base is made up of observation reports from many different sources, there are several hundred combinations of the **DCK** (deck) and **SID** (sources) flags that indicate the origin of the data. Typically, **DCK** indicates the **type of data** (e.g. US Navy ships; Japanese Whaling Fleet) and **SID** provides more information about the data system or format (e.g. data stream extracted from the WMO global telecommunications systems, GIS). **International Comprehensive Ocean-Atmosphere Data Set** (ICOADS) marine data source.
Sometimes a single DCK is associated with a single SID, sometimes a single DCK will contain several SID and vice versa, leading to a number of duplicated entries of meteorological observations. ICOADS is the world most extensive surface marine meteorological data collection.
Contains ocean surface and atmospheric observations from the 1600's
Historically archives of marine data have been maintained by individual nations, and often these were shared so that the same observations appear in the archives of several nations. Truncated formats often did not contain sufficient information to identify the observations made by a particular ship or platform, and these compact formats sometimes converted or encoded data in different ways. For example, many observations do not have an identifier linking to the ship (**ID**) or platform (**pt**), and for those that do have such identifiers they may be different between data sources. The main types of duplicates are: to present and is still receiving more data every year.
The data base is made up of observation reports from many different sources,
* Observations historically shared among national archives, likely to have different formats, precision, conversions and metadata. there are several hundred combinations of the **DCK** (deck) and **SID** (sources)
flags that indicate the origin of the data.
Typically, **DCK** indicates the **type of data**
(e.g. US Navy ships; Japanese Whaling Fleet) and **SID** provides more information
about the data system or format
(e.g. data stream extracted from the WMO global telecommunications systems, GIS).
Sometimes a single DCK is associated with a single SID,
sometimes a single DCK will contain several SID and vice versa,
leading to a number of duplicated entries of meteorological observations.
Historically archives of marine data have been maintained by individual nations,
and often these were shared so that the same observations appear in the archives
of several nations. Truncated formats often did not contain sufficient information
to identify the observations made by a particular ship or platform,
and these compact formats sometimes converted or encoded data in different ways.
For example, many observations do not have an identifier linking to the ship
(**ID**) or platform (**pt**), and for those that do have such identifiers
they may be different between data sources. The main types of duplicates are:
* Observations historically shared among national archives,
likely to have different formats, precision, conversions and metadata.
* Re-ingestion of the same data more than once. * Re-ingestion of the same data more than once.
...@@ -16,7 +37,13 @@ Historically archives of marine data have been maintained by individual nations, ...@@ -16,7 +37,13 @@ Historically archives of marine data have been maintained by individual nations,
* Planned redundancy, for example the ingestion of several near real time data streams. * Planned redundancy, for example the ingestion of several near real time data streams.
There is already a protocol and other tools written in Python to read and perform some quality control on the data and to identify duplicate observations as described in [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775). However the data processing in this repository offers additional quality control on the data, duplicate identification and linking of IDs between each pair of duplicate reports. Additionally provides an identification of the best duplicate by assessing the track (path in lat/lon) of the observation. There is already a protocol and other tools written in Python to read and
perform some quality control on the data and to identify duplicate observations
as described in [Freeman. *et al.,* (2017)](https://doi.org/10.1002/joc.4775).
However, the processing methods in this repository offer additional quality control
on the data, duplicate identification and linking of IDs between each pair of duplicate reports.
It also provides an identification of the best duplicate by assessing the track
(path in lat/lon) of the observation.
References References
---------- ----------
......
Clone repository
  • API Reference
  • Examples
  • Home
  • How to install
  • Introduction
  • JASMIN tips
  • Releases
  • Workflow
  • Workflow
    • Data selection
    • Duplicate indentification
    • Matching criteria
    • Processing of IDs
    • Quality control