The following are additional features under consideration for inclusion in the QC workflow. Short explanations of why each might be useful are included.
- Option to provide input data to QC workflow using a CSV file (e.g., from a DwC archive) as input.
- Currently the QC workflow uses a query provided as a command-line option to retrieve input data from a MongoDB database.
- Users may want to provide data in the form of CSV file so that loading input data into MongoDB instance is not needed.
- When MongoDB query is used to provide input to workflow, option to write out that input data set as a CSV file.
- When the workflow is run using data in MongoDB as input, no record is made of the actual data passed into the workflow.
- Given that the deta in MongoDB could change following the workflow run, provenance is being lost.
- It also could be useful for users to be able to subset their input data set manually using a CSV file and then run the workflow again using this subset.
- Preservation of original data values in records passed between actors so that multiple actors can validate or propose new values for a particular field.
- Data validation actors in the QC workflow currently overwrite the original values in the record fields for which they propose updated values
- Option to save workflow results in a CSV file that includes all the original values as well as new values proposed by the validation actors.
- Inclusion of original fields (and headers) in results spreadsheet to ease direct comparison with input data.