Skip to content

Installation and environment settings

whyqd (/wɪkɪd/) can be integrated into an existing data importer, or you could use it as part of your data analysis and exploration in Jupyter Notebooks.


Install with your favourite package manager:

pip install whyqd

Then import in Python:

import whyqd as qd


whyqd uses Ray and Modin as a drop-in replacement for Pandas to support processing of large datasets. This is less noticeable if you mostly work with <1m rows of data, but the power is there should you need it.

The following can be set in your root .env project file:

  • WHYQD_MEMORY: the memory allocated for processing (default is 6Gb, as bytes 6000000000).
  • WHYQD_CPUS: number of CPUS allocated for parallel processing (default is 3).
  • WHYQD_SPILLWAY: Ray will spill to local storage when memory is exceeded. You can specify a temporary folder (default is "/tmp/spill"). This will be automatically cleared every time whyqd is restarted or reinitialised.
  • WHYQD_DIRECTORY: a working directory for local storage (default is "").
  • WHYQD_DEFAULT_MIMETYPE: a default mime type for destination data (default output is "application/vnd.apache.parquet").

whyqd supports any of the following file mime types:

  • CSV: "text/csv"
  • XLS: "application/"
  • XLSX: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  • PARQUET (or PRQ): "application/vnd.apache.parquet"
  • FEATHER (or FTR): "application/vnd.apache.feather"

Declare it like so:

MIMETYPE = "xlsx" # upper- or lower-case is fine

Neither of Parquet or Feather yet have official mimetypes, so this is what we're using for now.