"First clone the [gitlab repository](https://git.noc.ac.uk/brecinosrivas/mdf_reader) and modify the path in `sys.path.append()` with the directory path where you store the `mdf_reader` repository."
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"2020-12-10 09:49:38,429 - root - INFO - init basic configure of logging success\n"
]
}
],
"source": [
"import os\n",
"import sys\n",
"sys.path.append('/home/bea/')\n",
"import mdf_reader\n",
"import json"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `mdf_reader` is a python3 tool designed to read data files compliant with a user specified [data\n",
"It was developed with the initial idea to read the [IMMA](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf) data format, but it was further enhanced to account for other meteorological data formats. \n",
"\n",
"Lets see an example for a typical file from [ICOADSv3.0.](https://icoads.noaa.gov/r3.html). We pick an specific montly output for a Source/Deck. In this case data from the Marine Meterological Journals data set SID/DCK: **125-704 for Oct 1878.**\n",
"\n",
"The `.imma` file looks like this:"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/home/bea/.virtualenvs/c3s/lib/python3.6/site-packages/ipykernel_launcher.py:4: FutureWarning: read_table is deprecated, use read_csv instead, passing sep='\\t'.\n",
"This is why we need the `mdf_reader` tool, to helps us put those imma files in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) format. For that we need need a **schema**.\n",
"\n",
"A **schema** file gathers a collection of descriptors that enable the `mdf_reader` tool to access the content\n",
"of a `data model/ schema` and extract the sections of the raw data file that contains meaningful information. These **schema files** are the `bones` of the data model, basically `.json` files outlining the structure of the incoming raw data.\n",
"\n",
"The `mdf_reader` takes this information and translate the characteristics of the data to a python pandas dataframe.\n",
"\n",
"The tool has several **schema** templates build in."
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [],
"source": [
"template_names = mdf_reader.schemas.templates()"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"['fixed_width_complex_exc',\n",
" 'delimited_sections',\n",
" 'delimited_basic',\n",
" 'fixed_width_complex_opt',\n",
" 'fixed_width_sections',\n",
" 'fixed_width_basic']"
]
},
"execution_count": 5,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"template_names"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As well as templates for `code_tables` which are `.json` files containing keys to describe complex metereological variables like weather forcast or sea state conditions."
"A new **schema** can be build for a particular deck and source as shown in this notebook. The `imma1_d704` schema was build upon the `imma1` schema/data model but extra sections have been added to the `.json` files to include suplemental data from ICOADS documentation. This is a snapshot of the data inside the `imma1_d704.json`.\n",
"To learn how to construc a schema or data model for a particular deck/source, visit this other tutorial. INSERT LINK TO NEXT NOTEBOOK"
"To learn how to construc a schema or data model for a particular deck/source, visit this other tutorial: [create_data_model.ipynb](https://git.noc.ac.uk/brecinosrivas/mdf_reader/-/blob/master/docs/notebooks/create_data_model.ipynb)"