mdf_reader.read
¶
Manages the integral sequence in data file reading from a data model:
Access to data model
Data file import
Data file reading
Data validation
Output
- Contains the following functions:
ERV - does the actual extraction, read and validation of data input data
main - the main function of the script
- Can be run as a script with:
python -m mdf_reader data_file **kwargs
Module Contents¶
Functions¶
|
Extracts, reads and validates data input data. |
|
Validates input argument is as expected type |
|
Validates input argument is an existing directory |
|
Reads a data file to a pandas DataFrame using a pre-defined data model. |
Attributes¶
-
ERV
(TextParser, read_sections_list, schema, code_tables_path)[source]¶ Extracts, reads and validates data input data.
- Parameters
TextParser (list or pandas.io.parsers.TextFileReader) – The data to extract and read
read_sections_list (list) – List with subset of data model sections to output
schema (dict) – Data model schema
code_tables_path (str) – Path to data model code tables
- Returns
data (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the input data extracted and read
valid (pandas.DataFrame, pandas.io.parsers.TextFileReader) – Contains the a boolean mask with the data validation output
-
validate_arg
(arg_name, arg_value, arg_type)[source]¶ Validates input argument is as expected type
- Parameters
arg_name (str) –
arg_value (arg_type) –
arg_type (python type) –
- Returns
- Return type
True,False
-
validate_path
(arg_name, arg_value)[source]¶ Validates input argument is an existing directory
- Parameters
arg_name (str) –
arg_value (str) –
- Returns
- Return type
True,False
-
main
(source, data_model=None, data_model_path=None, sections=None, chunksize=None, skiprows=None, out_path=None)[source]¶ Reads a data file to a pandas DataFrame using a pre-defined data model. Read data is validates against its data model producing a boolean mask on output.
The data model needs to be input to the module as a named model (included in the module) or as the path to a valid data model.
- Parameters
source (str) – The file path to read
- Keyword Arguments
data_model (str, optional) – Name of internally available data model
data_model_path (str, optional) – Path to external data model
sections (list, optional) – List with subset of data model sections to outpu (default is all)
chunksize (int, optional) – Number of reports per chunk (default is no chunking)
skiprows (int, optional) – Number of initial rows to skip from file (default is 0)
out_path (str, optional) – Path to output data, valid mask and attributes (default is no output)
- Returns
output – Attributes data, mask and atts contain the corresponding information from the data file.
- Return type
object
Note
This module can also be run as a script, with the keyword arguments as name_arg=arg