`mdf_reader.read`¶

Manages the integral sequence in data file reading from a data model:

Access to data model

Data file import

Data file reading

Data validation

Output

Contains the following functions:

ERV - does the actual extraction, read and validation of data input data
main - the main function of the script

Can be run as a script with:

python -m mdf_reader data_file **kwargs

Module Contents¶

Functions¶

`ERV`(TextParser, read_sections_list, schema, code_tables_path)	Extracts, reads and validates data input data.
`validate_arg`(arg_name, arg_value, arg_type)	Validates input argument is as expected type
`validate_path`(arg_name, arg_value)	Validates input argument is an existing directory
`main`(source, data_model=None, data_model_path=None, sections=None, chunksize=None, skiprows=None, out_path=None)	Reads a data file to a pandas DataFrame using a pre-defined data model.

Attributes¶

`toolPath`
`schema_lib`

toolPath[source]¶

schema_lib[source]¶

ERV(TextParser, read_sections_list, schema, code_tables_path)[source]¶

Extracts, reads and validates data input data.

Parameters

TextParser (list or pandas.io.parsers.TextFileReader) – The data to extract and read
read_sections_list (list) – List with subset of data model sections to output
schema (dict) – Data model schema
code_tables_path (str) – Path to data model code tables

Returns

data (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the input data extracted and read
valid (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the a boolean mask with the data validation output

validate_arg(arg_name, arg_value, arg_type)[source]¶

Validates input argument is as expected type

Parameters

arg_name (str) –
arg_value (arg_type) –
arg_type (python type) –

Returns

Return type

True,False

validate_path(arg_name, arg_value)[source]¶

Validates input argument is an existing directory

Parameters

arg_name (str) –
arg_value (str) –

Returns

Return type

True,False

main(source, data_model=None, data_model_path=None, sections=None, chunksize=None, skiprows=None, out_path=None)[source]¶

Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.

The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.

Parameters

source (str) – The file path to read

Keyword Arguments

data_model (str, optional) – Name of internally available data model
data_model_path (str, optional) – Path to external data model
sections (list, optional) – List with subset of data model sections to outpu (default is all)
chunksize (int, optional) – Number of reports per chunk (default is no chunking)
skiprows (int, optional) – Number of initial rows to skip from file (default is 0)
out_path (str, optional) – Path to output data, valid mask and attributes (default is no output)

Returns

output – Attributes data, mask and atts contain the corresponding information from the data file.

Return type

object

Note

This module can also be run as a script, with the keyword arguments as name_arg=arg

mdf_reader.read¶

Module Contents¶

Functions¶

Attributes¶

`mdf_reader.read`¶