Data reader toolbox documentation¶
The mdf_reader is a python3 tool designed to read data files compliant with a user specified data model.
It was developed with the initial idea of reading data from the International Comprehensive Ocean-Atmosphere Data Set (ICOADS) stored in the International Maritime Meteorological Archive (IMMA) data format.
The tool has been further enhanced to account for any marine meteorological data format, provided that this data meets the following specifications:
Data is stored in a human-readable manner: ASCII.
Data is organized in single line reports (e.g. rows of observations separated by a delimiter like .csv).
Reports have a coherent internal structure that can be modelized.
Reports are fixed width or field delimited types.
Reports can be organized in sections, in which case each section can be of different types (fixed width of delimited).
The mdf_reader uses the information provided in a data model to read meteorological data into a python pandas.DataFrame, with the column names and data types set according to each data element’s description specified in the data model or schema. In addition to reading, the mdf_reader validates data elements against the schema provided.
This tool outputs a python object with the following attributes:
A pandas.DataFrame (DF) with the data values.
A boolean pandas DF with the data validation mask.
A dictionary with a simplified version of the input data model.
The reader allows for basic transformations of the data. This feature includes basic numeric data decoding (base36, signed_overpunch) and numeric data conversion (scale and offset).
Several data models have been added to the tool including the IMMA schema: ~/mdf_reader/data_models/lib/imma1
.
Note
Data from other data models than those already available can be read, providing that this data meets the basic specifications listed above. A data model can be built externally and fed into the tool.