TransformDefinition
TransformDefinition
¶
Bases: BaseDefinition
Create and manage a transform to perform a schema to schema crosswalk on a tabular data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform |
TransformModel | dict | Path | str | None
|
Path to a transform definition, or a dictionary conforming to a transform model. |
None
|
crosswalk |
CrosswalkDefinition | CrosswalkModel | dict | Path | str | None
|
A definition, or a dictionary conforming to the CrosswalkModel, or a path to a saved definition. |
None
|
data_source |
DataSourceModel | dict | Path | str | None
|
Path to a tabular data source, or a dictionary conforming to a data source model. |
None
|
Example
Create a new TransformDefinition
, and perform a crosswalk, then save the definition and transformed data
as follows:
import whyqd as qd
transform = qd.TransformDefinition(crosswalk=CROSSWALK, data_source=DATASOURCE)
transform.process()
transform.save(directory=DIRECTORY)
get: TransformModel | None
property
¶
Get the transform model.
Returns:
Type | Description |
---|---|
TransformModel | None
|
Pydantic TransformModel or None |
process()
¶
Perform a crosswalk. You can access the dataframe after completion at .data
, if it exists.
Raises:
Type | Description |
---|---|
ValueError
|
If there are missing required destination fields in the crosswalk. |
save(*, filename=None, mimetype=None, directory=None, created_by=None, hide_uuid=False)
¶
Save model as a json file, and save crosswalked destination dataframe as a chosen mimetype.
Info
NOTE: by default, transformed data are saved as PARQUET
as this is the most efficient.
Declare your mime type like so:
MIMETYPE = "csv" # upper- or lower-case is fine
Parameters:
Name | Type | Description | Default |
---|---|---|---|
directory |
str | Path | None
|
Defaults to working directory |
None
|
filename |
str | None
|
Defaults to model name |
None
|
mimetype |
str | None
|
whyqd supports saving to CSV, XLS, XLSX, Feather and Parquet files. Defaults to Parquet. |
None
|
created_by |
str | None
|
Declare the model creator/updater |
None
|
hide_uuid |
bool
|
Hide all UUIDs in the nested JSON output. |
False
|
Returns:
Type | Description |
---|---|
bool
|
Boolean True if saved. |
set(*, transform=None, crosswalk=None, data_source=None)
¶
Update or create the transform.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform |
TransformModel | dict | Path | str | None
|
Path to a transform definition, or a dictionary conforming to a transform model. |
None
|
crosswalk |
CrosswalkDefinition | CrosswalkModel | dict | Path | str | None
|
A definition, or a dictionary conforming to the CrosswalkModel, or a path to a saved definition. |
None
|
data_source |
DataSourceModel | dict | Path | str | None
|
Path to a tabular data source, or a dictionary conforming to a data source model. |
None
|
validate(*, transform, data_destination, mimetype_destination=None, data_source=None, mimetype_source=None)
¶
Validate the transformation process and all data checksums. Will perform all actions on each interim data source.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
transform |
TransformModel | dict | Path | str
|
Path to a transform definition, or a dictionary conforming to a transform model. |
required |
data_destination |
DataSourceModel | dict | Path | str
|
Path to a tabular data source, or a dictionary conforming to a data source model. Destination data for crosswalk validation. |
required |
mimetype_destination |
str | MimeType | None
|
whyqd supports reading from CSV, XLS, XLSX, Feather and Parquet files. Required if
|
None
|
data_source |
DataSourceModel | dict | Path | str | None
|
Path to a tabular data source, or a dictionary conforming to a data source model. Should be defined
in |
None
|
mimetype_source |
str | MimeType | None
|
whyqd supports reading from CSV, XLS, XLSX, Feather and Parquet files. Required if
|
None
|
Raises:
Type | Description |
---|---|
ValueError
|
If any steps fail to validate. |
Returns:
Type | Description |
---|---|
bool
|
A boolean |