You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

The following are additional features under consideration for inclusion in the QC workflow.  Short explanations of why each might be useful are included.

  1. Option to provide input data to QC workflow using a CSV file (e.g., from a DwC archive) as input.
    • Currently the QC workflow uses a query provided as a command-line option to retrieve input data from a MongoDB database.  
    • Users may want to provide data in the form of CSV file so that  loading input data into MongoDB instance is not needed. 
  2. When MongoDB query is used to provide input to workflow, option to write out that input data set as a CSV file.
    • When the workflow is run using data in MongoDB as input, no record is made of the actual data passed into the workflow.  
    • Given that the deta in MongoDB could change following the workflow run, provenance is being lost.  
    • It also could be useful for users to be able to subset their input data set manually using a CSV file and then run the workflow again using this subset.
  3. Preservation of original data values in records passed between actors so that multiple actors can validate or propose new values for a particular field.
    • Data validation actors in the QC workflow currently overwrite the original values in the record fields for which they propose updated values
  4. Option to save workflow results in a CSV file that includes all the original values as well as new values proposed by the validation actors.
  5. Inclusion of original fields (and headers) in results spreadsheet to ease direct comparison with input data.
  • No labels