cdm

Submodules

Package Contents

Functions

map_model(imodel, data, data_atts, cdm_subset=None, log_level='INFO')

Calls the main mapping function _map()

cdm_to_ascii(cdm, delimiter='|', null_label='null', cdm_complete=True, extension='psv', out_dir=None, suffix=None, prefix=None, log_level='INFO')

Exports a complete cdm file with multiple tables to an ascii file.

table_to_ascii(table, table_atts, delimiter='|', null_label='null', cdm_complete=True, filename=None, full_table=True, log_level='INFO')

Exports a cdm table to an ascii file.

read_tables(tb_path, tb_id, cdm_subset=None, delimiter='|', extension='psv', col_subset=None, log_level='INFO', na_values=[])

Reads CDM table like files from file system to a pandas data frame.

map_model(imodel, data, data_atts, cdm_subset=None, log_level='INFO')[source]

Calls the main mapping function _map()

Parameters
  • imodel (a data model that can be of several types.) –

    1. A generic mapping from a defined data model, like IMMA1’s core and attachments. e.g. ~/cdm-mapper/lib/mappings/icoads_r3000

    2. A specific mapping from generic data model to CDM, like map a SID-DCK from IMMA1’s core and attachments to CDM in a specific way.

    e.g. ~/cdm-mapper/lib/mappings/icoads_r3000_d704

  • data (input data to map.) – e.g. a pandas.Dataframe or io.parsers.TextFileReader objects or in-memory text streams (io.StringIO object).

  • data_atts (dictionary with the {element_name:element_attributes} of the data.) – Type: string.

  • cdm_subset (subset of CDM model tables to map.) – Defaults to the full set of CDM tables defined for the imodel. Type: list.

  • log_level (level of logging information to save.) – Defaults to ‘DEBUG’. Type string.

Returns

  • cdm_tables – a python dictionary with the {cdm_table_name: cdm_table_object} pairs.

  • For more information look at the _map function.

cdm_to_ascii(cdm, delimiter='|', null_label='null', cdm_complete=True, extension='psv', out_dir=None, suffix=None, prefix=None, log_level='INFO')[source]

Exports a complete cdm file with multiple tables to an ascii file. Exports a complete cdm file with multiple tables written in the C3S Climate Data Store Common Data Model (CDM) format to ascii files. The tables format is contained in a python dictionary, stored as an attribute in a pandas.DataFrame (or pd.io.parsers.TextFileReader).

Parameters
  • cdm – common data model tables to export

  • delimiter – default ‘|’

  • null_label – specified how nan are represented

  • cdm_complete – extract the entire cdm file

  • extension – default ‘psv’

  • out_dir – where to stored the ascii file

  • suffix – file suffix

  • prefix – file prefix

  • log_level – level of logging information

Returns

Return type

Saves the cdm tables as ascii files in the given directory with a psv extension.

table_to_ascii(table, table_atts, delimiter='|', null_label='null', cdm_complete=True, filename=None, full_table=True, log_level='INFO')[source]

Exports a cdm table to an ascii file. Exports tables written in the C3S Climate Data Store Common Data Model (CDM) format to ascii files. The tables format is contained in a python dictionary, stored as an attribute in a pandas.DataFrame (or pd.io.parsers.TextFileReader).

Parameters
  • table – pandas.Dataframe to export

  • table_atts (attributes of the pandas.Dataframe stored as a python dictionary.) – This contains all element names, characteristics and types encoding, as well as other characteristics e.g. decimal places, etc.

  • delimiter – default ‘|’

  • null_label – specified how nan are represented

  • cdm_complete (if we export the entire set of tables.) – default is True

  • filename – the name of the file to stored the data

  • full_table – if we export a single table

  • log_level – level of logging information to be saved

Returns

Return type

Saves cdm tables as ascii files

read_tables(tb_path, tb_id, cdm_subset=None, delimiter='|', extension='psv', col_subset=None, log_level='INFO', na_values=[])[source]

Reads CDM table like files from file system to a pandas data frame.

Parameters
  • tb_path – path to the file

  • tb_id – any identifier including wildcards if required extension, defaulting to ‘psv’

  • cdm_subset (specifies a subset of tables or a single table.) –

    • For multiple subsets of tables: This option will return a pandas.Dataframe that is multi-index at

    the columns, with (table-name, field) as column names. Tables are merged via the report_id field. - For a single table: the function returns a pandas.Dataframe with a simple indexing for the columns.

  • delimiter – default is ‘|’

  • extension – default is psv

  • col_subset (a python dictionary specifying the section or sections of the file to read) –

    • For multiple sections of the tables:

      e.g col_subset = {table0:[columns],...tablen:[columns]}

    • For a single section:

      e.g. list type object col_subset = [columns] This variable assumes that the column names are all conform to the cdm field names in lib.tables/*.json

  • log_level (Level of logging messages to save) –

  • na_values (specifies the format of NaN values) –

Returns

  • pandas.Dataframe (either the entire file or a subset of it.)

  • logger.error (logs specific messages if there is any error.)