Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Option to provide input data to QC workflow using a CSV file (e.g., from a DwC archive) as input.

    • Currently the QC workflow uses a query provided as a command-line option to retrieve input data from a MongoDB database.  
    • Users may want to provide data in the form of CSV file so that  loading input data into MongoDB instance is not needed. 

  2. When MongoDB query is used to provide input to workflow, option to write out that input data set as a CSV file.

    • When the workflow is run using data in MongoDB as input, no record is made of the actual data passed into the workflow.  
    • Given that the deta data in MongoDB could change following the workflow run, provenance is being the provenance of workflow outputs can can be lost.  
    • It also could be useful for users to subset their input data set manually using a CSV file and then run the workflow again using this subset (see 1 above).

  3. Preservation of original data values in records passed between actors.

    • Data validation actors in the QC workflow currently overwrite the original values in the record fields for which they propose updated values.
    • Although comments added as new fields into the records record the original values, these are not as easily read programmatically, e.g. by a user of the report spreadsheet.
    • Overwriting values also means that downstream actors cannot access the original values and propose alternative values based on the originals.

  4. Actor for outputting the results spreadsheet (or CSV file) automatically at the end of the workflow run.

    • Currently the QC workflow writes its output to a MongoDB instance.  A separate program is used to generate the report spreadsheet from these results in MongoDB.
    • Users may want to use the results of a workflow run without having to query a MongoDB database (manually or using the report-generating program) to evaluate the results of a workflow run.
    • Based on command line options the QC workflow itself could output a report spreadsheet, a CSV file, or both.
    Inclusion of original fields in results spreadsheet to ease direct comparison with input data
    • .