Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

The following are additional features under consideration proposed by T.M. for inclusion in the QC workflow.  Short explanations of why each might be useful are included.

  1. Option to provide input data to QC workflow using a CSV file (e.g., from a DwC archive) as input.
    • Currently the QC workflow uses a query provided as a command-line option to retrieve input data from a MongoDB database.  
    • Users may want to provide data in the form of CSV file so that  loading input data into MongoDB instance is not needed. 

  2. When MongoDB query is used to provide input to workflow, option to write out that input data set as a CSV file.
    • When the workflow is run using data in MongoDB as input, no record is made of the actual data passed into the workflow.  
    • Given that the deta in MongoDB could change following the workflow run, provenance is being lost.  
    • It also could be useful for users to be able to subset their input data set manually using a CSV file and then run the workflow again using this subset (see 1 above).
  3. Preservation of original data values in records passed between actors so that multiple actors can validate or propose new values for a particular field.
    • Data validation actors in the QC workflow currently overwrite the original values in the record fields for which they propose updated values
    Option to save workflow results in a CSV file that includes all the original values as well as new values proposed by the validation actors
    • .
    • Although comments added as new fields into the records record the original values, these are not as easily read programmatically, e.g. by a user of the report spreadsheet.
    • Overwriting values also means that downstream actors cannot access the original values and propose alternative values based on the originals.
  4. Actor for outputting the results spreadsheet (or CSV file) automatically at the end of the workflow run.
    • Currently the QC workflow writes its output to a MongoDB instance.  A separate program is used to generate the report spreadsheet from these results in MongoDB.
    • Users may want to use the results of a workflow run without having to query a MongoDB database to evaluate the results of a workflow run.
    • Based on command line options the workflow could output a report spreadsheet, a CSV file, or both.
  5. Inclusion of original fields (and headers) in results spreadsheet to ease direct comparison with input data.