mdf_reader.read

Manages the integral sequence in data file reading from a data model:

  • Access to data model

  • Data file import

  • Data file reading

  • Data validation

  • Output

Contains the following functions:
  • ERV - does the actual extraction, read and validation of data input data

  • main - the main function of the script

Can be run as a script with:

python -m mdf_reader data_file **kwargs

Module Contents

Functions

ERV(TextParser, read_sections_list, schema, code_tables_path)

Extracts, reads and validates data input data.

validate_arg(arg_name, arg_value, arg_type)

Validates input argument is as expected type

validate_path(arg_name, arg_value)

Validates input argument is an existing directory

main(source, data_model=None, data_model_path=None, sections=None, chunksize=None, skiprows=None, out_path=None)

Reads a data file to a pandas DataFrame using a pre-defined data model.

Attributes

toolPath

schema_lib

toolPath[source]
schema_lib[source]
ERV(TextParser, read_sections_list, schema, code_tables_path)[source]

Extracts, reads and validates data input data.

Parameters
  • TextParser (list or pandas.io.parsers.TextFileReader) – The data to extract and read

  • read_sections_list (list) – List with subset of data model sections to output

  • schema (dict) – Data model schema

  • code_tables_path (str) – Path to data model code tables

Returns

  • data (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the input data extracted and read

  • valid (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the a boolean mask with the data validation output

validate_arg(arg_name, arg_value, arg_type)[source]

Validates input argument is as expected type

Parameters
  • arg_name (str) –

  • arg_value (arg_type) –

  • arg_type (python type) –

Returns

Return type

True,False

validate_path(arg_name, arg_value)[source]

Validates input argument is an existing directory

Parameters
  • arg_name (str) –

  • arg_value (str) –

Returns

Return type

True,False

main(source, data_model=None, data_model_path=None, sections=None, chunksize=None, skiprows=None, out_path=None)[source]

Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.

The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.

Parameters

source (str) – The file path to read

Keyword Arguments
  • data_model (str, optional) – Name of internally available data model

  • data_model_path (str, optional) – Path to external data model

  • sections (list, optional) – List with subset of data model sections to outpu (default is all)

  • chunksize (int, optional) – Number of reports per chunk (default is no chunking)

  • skiprows (int, optional) – Number of initial rows to skip from file (default is 0)

  • out_path (str, optional) – Path to output data, valid mask and attributes (default is no output)

Returns

output – Attributes data, mask and atts contain the corresponding information from the data file.

Return type

object

Note

This module can also be run as a script, with the keyword arguments as name_arg=arg