CDM tables mapping files and descriptors¶

The following section details the mapping sequence that the cdm-mapper tool follows to map meteorological data to a CDM element.

We will use part of the header.json python dictionary from the icoads_r3000 IMMA1 model to explain how we map an element. In the table below we explain all elements attributes and/or descriptors, that are needed in each python dictionary or .json file, for a successful mapping of the input meteorological data.

Below we see content from a header.json file:

{
    "report_id": {
        "sections": "c98",
        "elements": "UID",
        "transform": "string_add",
        "kwargs":{"prepend":"ICOADS-30","separator":"-"}
    },
    "application_area": {
        "default": [1,7,10,11]
    },
    "observing_programme": {
        "sections": "c1",
        "elements": "PT",
        "transform": "observing_programme"
    },
    "report_type": {
        "default": 0
    },
    "platform_type": {
        "sections": "c1",
        "elements": "PT",
        "code_table": "platform_type"
    },
    "platform_sub_type": {
        "sections": "c1",
        "elements": "PT",
        "code_table": "platform_sub_type"
    },
    "location_accuracy": {
        "sections": "core",
        "elements": ["LI","LAT"],
        "transform": "location_accuracy",
        "decimal_places": 0
    },
    "station_speed": {
        "sections": "core",
        "elements": ["YR","VS"],
        "code_table": "ship_speed_ms",
        "decimal_places":1
    },
    "source_id": {
        "sections":["c1","c1","core","core"],
        "elements": ["SID","DCK","YR","MO"],
        "transform": "string_join_add",
        "kwargs":{"prepend":"ICOADS-3-0-0T","separator":"-","zfill_col":[0,3],"zfill":[3,2]}
    },
    "source_record_id": {
        "sections": "c98",
        "elements": "UID"
    }
}

Descriptors¶

Descriptor variable name	Function
`elements`	String or list of strings with the element name (s) to map in the CDM table. e.g. `report_id` information is store in the `imma1` schema as `UID`, this will be the variable name assigned to the `element` attribute of `report_id`.
`sections`	String or list of strings with the section name(s) from which the element(s) to be map will come from. - Use a single string to define a unique section if all the elements are located in the same section, e.g. `location_accuracy`: the variables `["LI","LAT"]` come from a single section `core` in the IMMA1 model. - Use a list of strings to declare variables that come from multiple sections and elements. e.g. `source_id` - Always respect the order of the sections in the original schema.
`default`	Assigns a default value to the CDM element.
`fill_value`	Value to assign for missing data (NA/NaN). Datetime objects not supported.
`transform`	Name of the function to be used to perform the mapping of a specific element. This function must be defined in the `mapping_functions` class of the `imodel.py` module in order to be access by the mapper tool.
`kwars`	Keyword arguments of a transform function if any. Type dictionary with the format: {`keyword`:`value`,…,}
`code_table`	Code table name in the imodel mapping library needed to perform the mapping a particular element. Type: string.
`decimal_places`	Number of decimal places to keep when printing an element. Type: integer value, a function name used to estimate this figure. Such function should be defined in the same way as the transform function but these cannot take keyword arguments. `decimal_places = 0` for integer elements defined as numeric in CDM or the element will be printed with default number of decimal places.

Mapping sequence¶

The mapper parses the mapping file element by element and takes the following steps:

Clean imodel data
Remove any missing elements from the imodel. This preliminary step makes the definition of mapping functions easier, as no NaN handling needs to be added to the functions and integer fields casted to float by NA/NaN presence is reverted.
Map CDM element in the following order:
1. If transform: eval function and apply with elements and|or kwargs as appropriate
2. Else if code_table: map imodel elements using the defined code_table
3. Else if elements: assign imodel elements to CDM element
4. Else if value: assign value to CDM element
Fill CDM element NA/NaN values using default if defined
Define the number of decimal places in the CDM element attributes, so this gets pass to the table writer if ``decimal_places`` is provided

Defining mapping functions¶

In the file imodel.py the user can define any function to transform any element in the data model. The python file needs to be accompanied with __init__.py file so all the functions written in imodel.py can be imported by the cdm-mapper toolbox.

Note

Remember that any new python dependency that you import the top of your imodel.py must be installed also in your python environment.

The cdm-mapper follows a set of rules that need to be taken into account when it comes to adding functions to the imodel.py script.

The cdm-mapper only parses elements to the transforming function (e.g. Year, day or hour) or code_table mapping (e.g. platform_subtype), where none of the elements to be map (e.g. Year, day, hour or platform_subtype) have missing values.
The output of all functions in imodel.py must respect the element type defined in the imodel mapper.

Code tables¶

Elements defined in the imodel.json files (e.g. elements inside header.json) with the attribute code_table have an specific “key” that links the element variable to its corresponding numerical code defined in the C3S CDM. Code tables contain the key:value pairs and are stored as individual .json files in the lib/mappings/imodel/code_tables subdirectory.

The content of a code table translating platform_sub_type information into the appropriate CDM syntax’s (platform_sub_type.json) can be seen in text below:

{
    "7": 69
}

This code table is part of the icoads_r3000 data model included in this tool.

The following range of code table structures are currently supported:

Simple code tables: code tables with a list of key:value pairs.
Nested code tables: code tables with multiple (2 or more) keys mapping to a value -> key(1):…:key(n):value.
Range-keyed code tables: code tables (simple or multi-keyed) where one or more keys is a (integer) range of values.

For more information on code tables and their structure check out the mdf_reader tool - code tables information.

The code table above, is use by the icoads_r3000 imodel to map platform_sub_type information to the C3s CDM format, this is done in the following section of the header.json file:

"platform_sub_type": {
        "sections": "c1",
        "elements": "PT",
        "code_table": "platform_sub_type"
}

The “key” in this case, will be the value read from the ICOADS section c1 and element PT, for key values equal to 7 a 69 code will be assigned.

Code tables can be also used for simple transformations of the elements, depending on the medata data to map. e.g. The case of deck 701, where we expand ship names to the ships original full name. We do this by reading meta data information from the c99 ICOADS supplemental data attachment. The imodel for deck 701 provides a code table to transform the names into the ships original name format recorded in the original ship logbook (to see the ship_names.json code_table click in the following file):

"station_name": {
       "sections": "core",
       "elements": "ID",
       "code_table": "ship_names"
}