Actions

The Action class defines the restructuring actions performed in a Method. BaseAction classes are the main definitions for an action and are inherited by child classes in the /action folder.

The current list of actions are:

  • ASSIGN_CATEGORY_BOOLEANS - Assign values in a source data column as categorical boolean terms based on whether values are present, or are null.
  • ASSIGN_CATEGORY_UNIQUES - Assign unique values in a source data column as categorical unique terms defined in the Schema.
  • CALCULATE - Derive a calculation from a list of fields. Each field must have a modifier, including the first (e.g. +A -B +C).
  • CATEGORISE - Apply categories to a list of columns. Each field must have a modifier, including the first (e.g. +A -B +C). ‘-’ modifier indicates presence/absence of values as true/false for a specific term. ‘+’ modifier indicates that the unique terms in the field must be matched to the unique terms defined in the schema. This is a two-step process, first requiring listing the columns effected, then applying the terms.
  • DEBLANK - Remove all blank columns and rows from a DataFrame.
  • DEDUPE - Remove all duplicated rows from a DataFrame.
  • DELETE_COLUMNS - Delete columns provided in a list.
  • DELETE_ROWS - Delete rows provided in a list. They don’t have to be contiguous.
  • FILTER_AFTER - Filter a table by a date column after a specified date.
  • FILTER_BEFORE - Filter a table by a date column prior to a specified date.
  • FILTER_LATEST - Filter a table for the latest row in a specified filter column, and within an optional set of groups.
  • JOIN - Join values in different fields to create a new concatenated value. Each value will be converted to a string (e.g. A: ‘Word 1’ B: ‘Word 2’ => ‘Word 1 Word 2’).
  • MERGE - Merge a list of Pandas DataFrames into a single, new DataFrame, on a key column.
  • NEW - Create a new field and assign a set value.
  • ORDER - Use sparse data from a list of fields to populate a new field. Order is important, each successive field in the list have priority over the ones before it (e.g. for columns A, B & C, values in C will have precedence over values in B and A).
  • ORDER_NEW - Use sparse data from a list of fields to populate a new field order by most recent value. Field-pairs required, with the first containing the values, and the second the dates for comparison, linked by a ‘+’ modifier (e.g. A+dA, B+dB, C+dC, values with the most recent associated date will have precedence over other values).
  • ORDER_OLD - Use sparse data from a list of fields to populate a new field order by the oldest value. Field-pairs required, with the first containing the values, and the second the dates for comparison, linked by a ‘+’ modifier (e.g. A+dA, B+dB, C+dC, values with the oldest associated date will have precedence over other values).
  • PIVOT_CATEGORIES - Convert row-level categories into column categorisations.
  • PIVOT_LONGER - Transform a DataFrame from wide to long format.
  • REBASE - Rebase the header row at an indexed row and drop rows above that point.
  • RENAME - Rename an existing field to conform to a schema name. Only valid where a single field is provided.
  • RENAME_ALL - Rename header columns listed in a dict.
  • RENAME_NEW - Rename a column outside of the schema or existing column definitions. To be used with caution.
  • SPLIT - Split the string values in a single column into any number of new columns on a specified key.
class whyqd.base.action_base.BaseSchemaAction

Actions inherit from this base class which describes the core functions and methodology for an Action.

Actions should redefine name, title, description, modifiers and structure, as well as produce a transform function. Everything else will probably remain as defined, but particularly complex Actions should modify as required.

structure can be an empty list, but an Action may be defined by these parameters:

  • modifier: modifiers, of type ModifierModel defined by the ACTION and defining a transform.
  • field: the specific columns effected by the morph, a list of ColumnModel or, rarely, FieldModel.

A standard script is:

"ACTION > 'destination_column' < [modifier 'source_column', modifier 'source_column']"

Where the structure of the source array is defined by the ACTION.

modifiers

Describes the ModifierModels for the Action. Typical modifiers are + or - but the Action can implement any type of modifier as part of the transform function.

As an example:

[
    {
        "name": "+",
        "title": "Add",
        "type": "modifier"
    },
    {
        "name": "-",
        "title": "Subtract",
        "type": "modifier"
    }
]
Returns:ModifierModel representation of the modifiers.
Return type:None or ModifierModel
get_modifier(modifier: str) → Union[ModifierModel, None]

Return a specific set of Modifier definitions in response to an Modifier name.

Parameters:modifier (str) – A Modifier name.
Returns:For the requested Modifier name. Or None, if it doesn’t exist.
Return type:ModifierModel, or None
settings

Returns the dict representation of the Action.

Returns:Dict representation of an Action.
Return type:dict
transform(df: pd.DataFrame, destination: Optional[Union[FieldModel, ColumnModel]] = None, source: Optional[List[Union[ColumnModel, ModifierModel]]] = None) → pd.DataFrame

Perform a transformation. This function must be overridden by child Actions and describe a unique new method.

Warning

Assumes that is a valid call. Will raise exceptions from inside Pandas if not.

Parameters:
  • df (DataFrame) – Working data to be transformed
  • destination (FieldModel or ColumnModel, default None) – Destination column for the result of the Action. If required.
  • source (list of ColumnModel and / or ModifierModel) – List of source columns and modifiers for the action. If required.
Returns:

Containing the implementation of the Action

Return type:

Dataframe

class whyqd.base.morph_base.BaseMorphAction

Morphs differ from Actions in that they normally act to reshape entire tables rather than manipulate columns.

  • Morph actions are not permitted to be nested, i.e. they are stand-alone ActionScripts.
  • May result in changes to column or row references that must be accounted for in other actions.

Morphs inherit from this base class which describes the core functions and methodology for a Morph. They should redefine name, title, description, and structure, as well as produce a transform function. Everything else will probably remain as defined, but particularly complex Morphs should modify as required.

structure can be an empty list, but a morph may be defined by these parameters:

  • rows: the specific rows effected by the morph, a list of int
  • columns: the specific columns effected by the morph, a list of ColumnModel or FieldModel.

A standard script is:

"ACTION > [columns] < [rows]"

Where:

  • the presence and order of the arrays is set by structure,
  • columns are indicated by >, and
  • rows are indicated by <.
settings

Returns the dict representation of the Morph.

Raises:NameError if parameters don’t yet exist.
Returns:Dict representation of a Morph.
Return type:dict
transform(df: pd.DataFrame, rows: List[int], columns: List[ColumnModel]) → pd.DataFrame

Perform a transformation. This function must be overridden by child Morphs and describe a unique new method.

Warning

Assumes validates has been run.

Parameters:
  • df (DataFrame) – Working data to be transformed
  • **kwargs – Other fields which may be required in custom transforms
Returns:

Containing the implementation of the Morph

Return type:

Dataframe

class whyqd.base.category_base.BaseCategoryAction

Category Actions are support utilities for CATEGORY actions. These inherit from this base class which describes the core functions and methodology for this support Action.

Actions should redefine name, title, description, and structure. There is no transform function.

structure is defined by these parameters:

  • boolean: terms will be unique.
  • unique: the specific columns effected by the morph, a list of ColumnModel or, rarely, FieldModel.

A standard script is:

"ACTION > 'destination_column::destination_category' < ['unique_term', 'unique_term', etc.]"

Where the structure of the source array is defined by the ACTION.

settings

Returns the CategoryActionModel representation of the Action.

Returns:CategoryActionModel representation of an Action.
Return type:CategoryActionModel
parse(script: str) → Dict[str, str]

Base parser for the CategoryAction script. Produces required terms and validates against this CategoryAction’s structure requirements.

Script is of the form:

"ACTION > 'destination_column'::term < 'source_column'::[term]"

Which is inherited as:

{
    "action": ACTION,
    "destination": 'destination_column',
    "category": term,
    "source": 'source_column',
    "source_category": [term]
}
Parameters:script (str) – An action script.
Raises:ValueError for any parsing errors.
Returns:Parsed dictionary of validated split strings for further processing.
Return type:dict
class whyqd.base.filter_base.BaseFilterAction

Filter ACTIONS act to filter a table by a date column, while preserving information referenced in specific columns.

Actions should redefine name, title, description, and structure. There is no transform function.

structure is defined by these parameters:

  • column: a specific column for grouped-by filtering, can be ColumnModel or FieldModel.
  • date: a specific date reference, in ISO YYYY-MM-DD format. Times are not filtered and would need to be
    added to this feature.

A standard script is:

"FILTER_LATEST > 'filter_column' < 'source_column'"
settings

Returns the FilterActionModel representation of the Action.

Returns:FilterActionModel representation of an Action.
Return type:FilterActionModel
parse(script: str) → Dict[str, Union[str, List[str]]]

Base parser for the FilterAction script. Produces required terms and validates against this FilterAction’s structure requirements.

Script is of the form:

"FILTER_LATEST > 'filter_column' < 'source_column'"

Which is inherited as:

{
    "action": ACTION,
    "filter": 'filter_column',
    "date": date,
    "column": 'source_column'
}
Parameters:script (str) – An action script.
Raises:ValueError for any parsing errors.
Returns:Parsed dictionary of validated split strings for further processing.
Return type:dict
transform(df: pd.DataFrame, filter_column: Union[FieldModel, ColumnModel], column: Optional[Union[FieldModel, ColumnModel]] = None, date_term: Optional[date] = None) → pd.DataFrame

Perform a transformation. This function must be overridden by child Actions and describe a unique new method.

Warning

Assumes that is a valid call. Will raise exceptions from inside Pandas if not.

Parameters:
  • df (DataFrame) – Working data to be transformed
  • filter_column (FieldModel or ColumnModel) – A date-field column to use to filter the table. Column values will be coerced to date-type.
  • column (FieldModel or ColumnModel, default None) – A column which defines the groups from which to extract the latest row.
  • date_term (date, default None) – A specific date for filtering.
Returns:

Containing the implementation of the Action

Return type:

Dataframe