whyqd has the single objective of transforming messy input data into a single standardised schema for further validation and analyis in other software. Anything that goes further than that is out of scope.

That still leaves a fair amount to do, including improving the documentation and tests:

  • pandas supports multiple fruity CSV formats (#*-seperated, etc) - need a config to support these wilder problems

  • Zip data files, method, and produce citation report as a single step to aid distribution

  • Validate a zipped output file and produce a validation report

  • Setting the missing_data function doesn’t do anything at the moment

  • Potentially refactor to make it more expressive and extensible?

  • Additional transformations / actions:

    • Pivots: some data are horizontal instead of vertical, e.g.
    details 2010 2015 2020
    Fish 200 350 500
    Cats 120 80 40

    We need an action to convert that to, e.g.:

    details year value
    Fish 2010 200
    Fish 2015 350
    Fish 2020 500
    Cats 2010 120
    Cats 2015 80
    Cats 2020 40

And, then, what would you like to see?