Installation and environment settings¶
whyqd (/wɪkɪd/) can be integrated into an existing data importer, or you could use it as part of your data analysis and exploration in Jupyter Notebooks.
Install with your favourite package manager:
pip install whyqd
Then import in Python:
import whyqd as qd
whyqd uses Ray and Modin as a drop-in replacement for Pandas to support processing of large datasets. This is less noticeable if you mostly work with <1m rows of data, but the power is there should you need it.
The following can be set in your root
.env project file:
WHYQD_MEMORY: the memory allocated for processing (default is 6Gb, as bytes
WHYQD_CPUS: number of CPUS allocated for parallel processing (default is 3).
WHYQD_SPILLWAY: Ray will spill to local storage when memory is exceeded. You can specify a temporary folder (default is "/tmp/spill"). This will be automatically cleared every time whyqd is restarted or reinitialised.
WHYQD_DIRECTORY: a working directory for local storage (default is "").
WHYQD_DEFAULT_MIMETYPE: a default mime type for destination data (default output is "application/vnd.apache.parquet").
whyqd supports any of the following file mime types:
Declare it like so:
MIMETYPE = "xlsx" # upper- or lower-case is fine