Data Models¶
Schema¶
The schema file gathers a collection of descriptors that enables the mdf_reader to access and extract meaningful units of information for each element.
Valid schemas files are json files that the tool accesses and stores internally as dictionaries. The basename of the schema file must be the same as the data model directory and its extension .json
![_images/schema.png](_images/schema.png)
Data model directory¶
There are two levels of information in the schema:
General information on the data format layout, that helps the tool decide which approach to follow in order to access the data content. This information is included in the header block at the top of the schema (see figure below).
Specific information on the data elements and, optionally, on the sections. In the case that the data model has its report elements organised in one or multiple sections (as shown in the figure below). This information is included in the elements block of the schema.
![_images/new_schema.png](_images/new_schema.png)
Content inside a schema.json
file.¶
The mdf_reader supports reading and validation of both internal and external schemas:
An internal data model has its schema registered within the tool. To read and validate data from these models, we only need to pass its reference name to the reader and validation modules, using the argument
data_model
. A list of the reference names for internally supported data models can be access via the tool’s function:import mdf_reader mdf_reader.properties.supported_data_models()
An external data model is a data format that is unknown to the tool. If the data model meets the specifications for which the tool was built, then a model can be built externally and fed into it for both functions data reading and model validation using the argument
data_model_path
:model_path = '~/mdf_reader/data_models/lib/imma1_d701' data_file_path = '~/mdf_reader/tests/data/069-701_1845-04_subset.imma' data = mdf_reader.read(data_file_path, data_model_path= model_path)
Code tables¶
![_images/elements.png](_images/elements.png)
Element content inside a schema.json
file.¶
Elements defined in the data model schema.json
with an element attribute "column_type": "key"
are linked to a code table in the data model through a codetable descriptor in the schema (e.g. "codetable": "ICOADS.C99.FORM"
). Code tables contain the key:value
pairs and are stored as individual .json
files in the data_models/schema/code_tables
subdirectory.
The content of a code table translating a ship-log report type into its real meaning (ICOADS.C99.FORM.json
) can be seen in text below:
{
" 1": "daily",
" 2": "reports more than once a day"
}
This code table is part of the imma1_d701
data model included in this tool.
The following range of code table structures are currently supported:
Simple code tables: code tables with a list of
key:value
pairs.Nested code tables: code tables with multiple (2 or more) keys mapping to a value
-> key(1):…:key(n):value.
Range-keyed code tables: code tables (simple or multi-keyed) where one or more keys is a (integer) range of values.
Code tables can be imported as python dictionaries directly using the json package. To be fully read by the tool, however, keys in range-keyed code tables need to be expanded and access to all code tables is managed in the application through a code table manager module.
The following commands typed in a python console, show how to access code table templates to create new code tables:
template_names = mdf_reader.code_tables.templates()
To copy a template to edit:
mdf_reader.code_tables.copy_template(template_name,out_path=file_path)
or:
mdf_reader. code_tables.copy_template(template_name,out_dir=dir_path)
Common features¶
As code tables are stored as .json
files, the json syntax rules must be met when they are generated. See the following link to a basic introduction to json syntax.
To create code tables it is important to highlight that:
String values must be written with double quotes
Keys must be strings
Values can be strings, numbers, objects (JSON objects), arrays, booleans (
true|false
) ornull
.Due to the way range keyed tables are parsed, keys cannot have the string
range_key
as initial substring (unless they are range keys).
Simple code tables¶
Simple code tables are built using a single json object (enclosed in curly braces) with the key:value
pairs separated by commas like the following example for a weather visibility indicator, the file name is visibility_ind.json
:
{
" ": "Not measured",
"0": "Measured",
"1": "Fog present"
}
Nested code tables¶
Nested code tables are included to deal with situations when a coded element’s encoding, varies according to an indicator (contained in a different element in the data) or/and changes along time (different code table versions). Instead of storing these tables in separate files, the tool allows to create nested code tables.
The following .json
file example shows a code table with 2 levels of indexing. It is built as a single json object in which the values of the key:value
pairs of the outer indexing level are simple code tables, instead of individual values.
Nested table (named: visibility.json
) example:
{
"0":
{"90":"<0.05 km",
"91":"0.05 km",
"92":"0.2 km",
"93":"0.5 km",
"94":"1 km",
"95":"2 km",
"96":"4 km",
"97":"10 km",
"98":"20 km",
"99":"50 km or more"},
"1":
{"90":"<0.05 km",
"91":"0.05 km",
"92":"0.2 km",
"93":"Fog present, no visibility reported",
"94":"1 km",
"95":"2 km",
"96":"4 km",
"97":"10 km",
"98":"20 km",
"99":"50 km or more"}
}
This type of nested code table requires an additional .keys
(named: visibility.keys
) file with the following format:
{
"('core1','VIS')" : ["('core1','VIS I')","('core1','VIS')"]
}
This code_table can be called from the schema.json
by setting the element descriptor column_type
to key
in the following way:
"VIS": {
"description": "Visibility",
"field_length": 2,
"column_type": "key",
"codetable": "visibility"
}
Note that only the nested code table visibility
is called not the .keys, and we do not require the .json
extension.
The data file schema provides the element:codetable
correspondence. However, to map the element to its value in the code table, it is necessary to know the elements in the data file from which the outer keys are derived. Each nested table table_name.json
has a companion .json
file table_name.keys
with a set of key:value
pairs. The key is the actual element the table decodes and the value is a list with the complete set of key elements, from outer to inner.
As a single table can be potentially used to code different data file elements, a key must be provided for every element wishing to be decoded with a nested table (even if it is unique)
Range-keyed code tables¶
Range-keyed code tables can be any a simple or a nested type of code table. This term will apply if any of its key:value
pairs is a range, like a period of years (1910-1945) or simply an integer interval (1-10).
Instead of building the table repeating each of the key:value
pairs for every value in the range, the corresponding range key pairs are defined as range (init, end [, step]):value in the json file. The code table manager will identify this special type of key and will expand the keys in the dictionary as is read internally.
Range keys rules and use:
Only integer ranges are currently supported
Parameter step is optional. Defaults to 1.
In ranges that apply to a range of years, the keyword yyyy can be used in the place of the end parameter. It will expand the period to the current year.
Example of a Range-key nested table named: ICOADS.CO.VS.json
is shown below:
{
"range_key(1750,1967)":
{
"0":"0 knots;[0.0,0.0,0.0] ms-1",
"1":"1-3 knots;[0.51444,1.02888,1.54332] ms-1",
"2":"4-6 knots;[2.05776,2.5722,3.08664] ms-1",
"3":"7-9 knots;[3.60108,4.11552,4.62996] ms-1",
"4":"10-12 knots;[5.1444,5.65884,6.17328] ms-1",
"5":"13-15 knots;[6.68772,7.20216,7.7166] ms-1",
"6":"16-18 knots;[8.23104,8.74548,9.25992] ms-1",
"7":"19-21 knots;[9.77436,10.2888,10.8032] ms-1",
"8":"22-24 knots;[11.3177,11.8321,12.3466] ms-1",
"9":"over 24 knots;[12.3466,12.861,null] ms-1"
},
"range_key(1968,yyyy)":
{
"0":"0 knots;[0.0,0.0,0.0] ms-1",
"1":"1-5 knots;[0.51444,1.54332,2.5722] ms-1",
"2":"6-10 knots;[3.08664,4.11552,5.1444] ms-1",
"3":"11-15 knots;[5.65884,6.68772,7.7166] ms-1",
"4":"16-20 knots;[8.23104,9.25992,10.2888] ms-1",
"5":"21-25 knots;[10.8032,11.8321,12.861] ms-1",
"6":"26-30 knots;[13.3754,14.4043,15.4332] ms-1",
"7":"31-35 knots;[15.9476,16.9765,18.0054] ms-1",
"8":"36-40 knots;[18.5198,19.5487,20.5776] ms-1",
"9":"over 40 knots;[21.092,22.1209,null] ms-1"
}
}
As is nested the corresponding ICOADS.CO.VS.keys
file looks as follows:
{
"('core','VS')" : ["('core','YR')","('core','VS')"]
}