Installation and environment settings¶
whyqd (/wɪkɪd/) can be integrated into an existing data importer, or you could use it as part of your data analysis and exploration in Jupyter Notebooks.
Install¶
Install with your favourite package manager:
pip install whyqd
Then import in Python:
import whyqd as qd
Settings¶
whyqd uses Ray and Modin as a drop-in replacement for Pandas to support processing of large datasets. This is less noticeable if you mostly work with <1m rows of data, but the power is there should you need it.
The following can be set in your root .env
project file:
WHYQD_MEMORY
: the memory allocated for processing (default is 6Gb, as bytes6000000000
).WHYQD_CPUS
: number of CPUS allocated for parallel processing (default is 3).WHYQD_SPILLWAY
: Ray will spill to local storage when memory is exceeded. You can specify a temporary folder (default is "/tmp/spill"). This will be automatically cleared every time whyqd is restarted or reinitialised.WHYQD_DIRECTORY
: a working directory for local storage (default is "").WHYQD_DEFAULT_MIMETYPE
: a default mime type for destination data (default output is "application/vnd.apache.parquet").
whyqd supports any of the following file mime types:
CSV
: "text/csv"XLS
: "application/vnd.ms-excel"XLSX
: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"PARQUET
(orPRQ
): "application/vnd.apache.parquet"FEATHER
(orFTR
): "application/vnd.apache.feather"
Declare it like so:
MIMETYPE = "xlsx" # upper- or lower-case is fine
Neither of Parquet or Feather yet have official mimetypes, so this is what we're using for now.