Skip to content

Installation and environment settings

whyqd (/wɪkɪd/) can be integrated into an existing data importer, or you could use it as part of your data analysis and exploration in Jupyter Notebooks.

Install

Install with your favourite package manager:

pip install whyqd

Then import in Python:

import whyqd as qd

Settings

whyqd uses Ray and Modin as a drop-in replacement for Pandas to support processing of large datasets. This is less noticeable if you mostly work with <1m rows of data, but the power is there should you need it.

The following can be set in your root .env project file:

  • WHYQD_MEMORY: the memory allocated for processing (default is 6Gb, as bytes 6000000000).
  • WHYQD_CPUS: number of CPUS allocated for parallel processing (default is 3).
  • WHYQD_SPILLWAY: Ray will spill to local storage when memory is exceeded. You can specify a temporary folder (default is "/tmp/spill"). This will be automatically cleared every time whyqd is restarted or reinitialised.
  • WHYQD_DIRECTORY: a working directory for local storage (default is "").
  • WHYQD_DEFAULT_MIMETYPE: a default mime type for destination data (default output is "application/vnd.apache.parquet").

whyqd supports any of the following file mime types:

  • CSV: "text/csv"
  • XLS: "application/vnd.ms-excel"
  • XLSX: "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet"
  • PARQUET (or PRQ): "application/vnd.apache.parquet"
  • FEATHER (or FTR): "application/vnd.apache.feather"

Declare it like so:

MIMETYPE = "xlsx" # upper- or lower-case is fine

Neither of Parquet or Feather yet have official mimetypes, so this is what we're using for now.