Schema

The Schema class defines a set of target fields for a data wranglng Method.

class schema.Schema(source=None, **kwargs)

Create and manage a target schema for a wrangling process.

Parameters:
  • source (path to a json file containing a saved schema, default is None) –
  • kwargs (a schema defined as a dictionary, or default blank dictionary) –
details

Schema name, title and description.

Parameters:
  • name (string) – Term used for filename and referencing. Will be lower-cased and spaces replaced with _
  • title (string) – Human-readable term used as name.
  • description (string) – Detailed description for the schema. Reference its objective and use-case.
Returns:

Return type:

dict

set_details(name=None, title=None, description=None)

Set schema name, title and description. Can also be used to update existing information.

Parameters:
  • name (string) – Term used for filename and referencing. Will be lower-cased and spaces replaced with _
  • title (string) – Human-readable term used as name.
  • description (string) – Detailed description for the schema. Reference its objective and use-case.
Raises:

KeyError: if missing name

default_filter_names

Default list of filter names available as field constraints for the schema. Returns only a list of types. Details for individual filters can be returned with default_filter_settings.

Returns:
Return type:list
default_filter_settings(filter_name)

Get the default settings available for a specific filter.

Parameters:filter_name (string) – A specific name for a filter type (as listed in default_filter_names).
Returns:
Return type:dict, or empty dict if no such filter_name
default_field_types

Default list of field names available to define fields for the schema. Returns only a list of types. Details for individual default fields can be returned with default_field_settings.

Returns:
Return type:list
default_field_settings(field_type)

Get the default settings available for a specific field type.

Parameters:field_type (string) – A specific term for a field type (as listed in default_field_types).
Returns:
Return type:dict, or empty dict if no such field_type
all_fields

type.

Returns:
Return type:dict, or empty dict if no fields
Type:Get dictionary of fields, where key
Type:field_name and value
all_field_names

Get list of field names,.

Returns:
Return type:list
field(name)

A specific field from the list of fields defining this schema, called by a unique name.

Parameters:name (string) – Field names must be unique, so a valid name in the field list will have no collisions.
Returns:
Return type:dict, or empty dict if no such name
set_field(**kwargs)

Set the parameters for a specific field to define this schema, called by a unique name. If the name is already in the schema, then this will update that field.

Parameters:kwargs (dict) – Parameters will be validated against the type requirements, so check carefully with default_field_settings.
set_field_constraints(name, **constraints)

Set the constraint parameters for a specific field to define this schema, called by a unique name already in the schema.

The structure of the constraints is defined as follows:

{
        "key": "value",
        "key": "value"
}

category is a special constraint that can be defined as e.g.:

{
        "category": ["term1", "term2"]
}

All that is required is a list, and the function will take care of the formal structure. filter are managed in the Method part of the process (since this are defined in respect to the data being structured).

Parameters:
  • name (string) – Specific name for a field already in the Schema
  • constraints (dict) – A set of key:value pairs defining constraints as described in default_field_settings.
set_field_category(name, *categories, overwrite=True)

Set the category constraint parameters for a specific field. categories is defined as e.g.:

["term1", "term2"]

All that is required is a list, and the function will take care of the formal structure.

Parameters:
  • name (string) – Specific name for a field already in the Schema
  • categories (list) – A list of string terms defining target categories.
  • overwrite (bool) – If field has existing category constraints, then overwrite with this new list.
set_field_filters(name, *filters)

Set the filter parameters for a specific field to define this schema, called by a unique name already in the schema.

Parameters:
  • name (string) – Specific name for a field already in the Schema
  • filters (list of strings) – A list of filter names as described in default_filter_names.
build_field(**field)

For a list of fields, defined as dictionaries, create and return Field objects.

Parameters:field (dictionary of Field parameters) –
Raises:ValueError: if field fails validation
Returns:
Return type:Field
build()

Build and validate the Schema.

build_default_fields()

Build the default fields for presentation to the user as options.

Returns:
Return type:list of Fields
validates

Schema validates with all fields unique and required terms.

Raises:ValueError on field failure.
Returns:bool
Return type:True for validates
settings

Schema settings returned as a dictionary.

Returns:dict
Return type:settings
schema_validation_settings

Schema settings for validation of output data and schema, returned as a dictionary.

Based on Frictionless Data Table Schema.

Returns:dict
Return type:settings
save(directory, filename=None, overwrite=False, created_by=None)

Schema settings returned as a dictionary.

Parameters:
  • directory (the destination directory) –
  • filename (default to schema name) –
  • overwrite (bool, True if overwrite existing file) –
  • created_by (string, or None, to define the schema creator/updater) –
Raises:

ValueError if no filename

Returns:

Return type:

bool True if saved