## Test and overview of the `mdf_reader` tool 

First clone the [gitlab repository](https://git.noc.ac.uk/brecinosrivas/mdf_reader) and modify the path in `sys.path.append()` with the directory path where you store the `mdf_reader` repository.

In [1]:
import os
import sys
sys.path.append('/home/bea/')
import mdf_reader
import json

2020-12-10 09:49:38,429 - root - INFO - init basic configure of logging success


The `mdf_reader` is a python3 tool designed to read data files compliant with a user specified [data
model](https://cds.climate.copernicus.eu/toolbox/doc/how-to/15_how_to_understand_the_common_data_model/15_how_to_understand_the_common_data_model.html). 

It was developed with the initial idea to read the [IMMA](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf) data format, but it was further enhanced to account for other meteorological data formats. 

Lets see an example for a typical file from [ICOADSv3.0.](https://icoads.noaa.gov/r3.html). We pick an specific montly output for a Source/Deck. In this case data from the Marine Meterological Journals data set SID/DCK: **125-704 for Oct 1878.**

The `.imma` file looks like this:

In [2]:
import pandas as pd

data_path = os.path.join('~/c3s_work','mdf_reader/tests/data/125-704_1878-10_subset.imma')
data_ori = pd.read_table(data_path)

  after removing the cwd from sys.path.


In [3]:
data_ori.head()

Unnamed: 0,"18781020 600 4228 29159 130623 10Panay 12325123 9961 4 165 17128704125 5 0 1 1FF111F11AAA1AAAA1AAA 9815020N163002199 0 100200180003Panay 78011118737S.P.Bray,Jr 013231190214 Bulkhead of cabin 1- .1022200200180014Boston Rio de Janeiro 300200180014001518781020 4220N 6630W 10 E 400200180014001518781020102 85 EXS WSW 0629601 58 BOC CU05R"
0,18781020 800 4231 29197 130623 10Panay 1...
1,187810201000 4233 29236 130623 10Panay 1...
2,187810201200 4235 29271 130623 10Panay 1...
3,187810201400 4237 29310 130623 10Panay 1...
4,187810201600 4233 29350 130423 10Panay 1...


Very messy to just read into python! 

This is why we need the `mdf_reader` tool, to helps us put those imma files in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) format. For that we need need a **schema**.

A **schema** file gathers a collection of descriptors that enable the `mdf_reader` tool to access the content
of a `data model/ schema` and extract the sections of the raw data file that contains meaningful information. These **schema files** are the `bones` of the data model, basically `.json` files outlining the structure of the incoming raw data.

The `mdf_reader` takes this information and translate the characteristics of the data to a python pandas dataframe.

The tool has several **schema** templates build in.

In [4]:
template_names = mdf_reader.schemas.templates()

In [5]:
template_names

['fixed_width_complex_exc',
 'delimited_sections',
 'delimited_basic',
 'fixed_width_complex_opt',
 'fixed_width_sections',
 'fixed_width_basic']

As well as templates for `code_tables` which are `.json` files containing keys to describe complex metereological variables like weather forcast or sea state conditions.

In [6]:
template_tables = mdf_reader.code_tables.templates()
template_tables

['nested', 'range_keyed_nested', 'simple', 'range_keyed_simple']

**Schemas** can be desinged to be deck specific like the example below

In [7]:
schema = 'imma1_d704'

data_file_path = '/home/bea/c3s_work/mdf_reader/tests/data/125-704_1878-10_subset.imma'

data = mdf_reader.read(data_file_path, data_model = schema)

2020-12-10 09:54:39,637 - root - INFO - READING DATA MODEL SCHEMA FILE...
2020-12-10 09:54:39,650 - root - INFO - EXTRACTING DATA FROM MODEL: imma1_d704
2020-12-10 09:54:39,652 - root - INFO - Getting data string from source...
2020-12-10 09:54:39,672 - root - INFO - Extracting and reading sections
2020-12-10 09:54:39,678 - root - INFO - Processing section partitioning threads
2020-12-10 09:54:39,679 - root - INFO - 1000 ...
2020-12-10 09:54:39,713 - root - INFO - done
2020-12-10 09:54:39,718 - root - INFO - 211000 ...
2020-12-10 09:54:39,761 - root - INFO - done
2020-12-10 09:54:39,762 - root - INFO - 29211000 ...
2020-12-10 09:54:39,793 - root - INFO - done
2020-12-10 09:54:39,794 - root - INFO - 3029211000 ...
2020-12-10 09:54:39,802 - root - INFO - done
2020-12-10 09:54:39,803 - root - INFO - 303029211000 ...
2020-12-10 09:54:39,809 - root - INFO - done
2020-12-10 09:54:39,813 - root - INFO - 30303029211000 ...
2020-12-10 09:54:39,821 - root - INFO - done
2020-12-10 09:54:39,822 - 

Reading section core
Reading section c1
Reading section c5
Reading section c6
Reading section c7
Reading section c8
Reading section c9
Reading section c95
Reading section c96
Reading section c97
Reading section c98
Reading section c99_sentinal
Reading section c99_journal
Reading section c99_voyage
Reading section c99_daily
Reading section c99_data4
Reading section c99_data5


2020-12-10 09:54:44,849 - root - ERROR - Code table not defined for element ('c99_data4', 'present_weather')
2020-12-10 09:54:44,853 - root - ERROR - Code table not defined for element ('c99_data4', 'clouds')
2020-12-10 09:54:44,875 - root - ERROR - Code table not defined for element ('c99_data4', 'sea_state')
2020-12-10 09:54:44,883 - root - ERROR - Code table not defined for element ('c99_data5', 'time_ind')
2020-12-10 09:54:44,889 - root - ERROR - Code table not defined for element ('c99_data5', 'compass_ind')
2020-12-10 09:54:44,893 - root - ERROR - Code table not defined for element ('c99_data5', 'ship_course_compass')
2020-12-10 09:54:44,902 - root - ERROR - Code table not defined for element ('c99_data5', 'ship_course_true')
2020-12-10 09:54:44,912 - root - ERROR - Code table not defined for element ('c99_data5', 'wind_dir_mag')
2020-12-10 09:54:44,918 - root - ERROR - Code table not defined for element ('c99_data5', 'wind_force')
2020-12-10 09:54:44,922 - root - ERROR - Code ta

A new **schema** can be build for a particular deck and source as shown in this notebook. The `imma1_d704` schema was build upon the `imma1` schema/data model but extra sections have been added to the `.json` files to include suplemental data from ICOADS documentation. This is a snapshot of the data inside the `imma1_d704.json`.
```
"c99_journal": {
            "header": {"sentinal": "1", "field_layout":"fixed_width","length": 117},
            "elements": {
              "sentinal":{
                  "description": "Journal header record identifier",
                  "field_length": 1,
                  "column_type": "str"
              },
              "reel_no":{
                  "description": "Microfilm reel number. See if we want the zero padding or not...",
                  "field_length": 3,
                  "column_type": "str",
                  "LMR6": true
              }

```



Now metadata information can be extracted as a component of the padas dataframe. 

In [8]:
data.data.c99_journal

Unnamed: 0,sentinal,reel_no,journal_no,frame_no,ship_name,journal_ed,rig,ship_material,vessel_type,vessel_length,...,hold_depth,tonnage,baro_type,baro_height,baro_cdate,baro_loc,baro_units,baro_cor,thermo_mount,SST_I
0,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
1,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
2,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
3,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
4,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
5,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
6,1,007,0129,0410,Emma,78,02,1,1,136,...,19,468,1,10,,In the after cabin,1,- .561,1,
7,1,002,0018,0003,Panay,78,01,1,1,187,...,23,1190,2,14,,Bulkhead of cabin,1,- .102,2,
8,1,007,0129,0410,Emma,78,02,1,1,136,...,19,468,1,10,,In the after cabin,1,- .561,1,
9,1,002,0033,0416,Emma C.Litchfield,78,02,1,1,128,...,17,483,1,17,01101878,After Cabin,1,+ .001,1,


To learn how to construc a schema or data model for a particular deck/source, visit this other tutorial: [create_data_model.ipynb](https://git.noc.ac.uk/brecinosrivas/mdf_reader/-/blob/master/docs/notebooks/create_data_model.ipynb)