Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1685

Full validation should create and initialize the validator before parsing/unparsing begins


    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Normal Normal
    • deferred
    • 2.0.0
    • Back End, Performance
    • None

      In many applications, validation will be turned on.

      In 2.0.0-rc2, it was observed that parse time increases with the volume of non-DFDL comments/annotations in the schema.

      This was with validation on. The explanation for this is that validation, which calls xerces currently, is constructing the validator and this cost is viewed as part of the cost of parsing, or perhaps even constructing the validator for every parse call.

      Now we're switching to woodstox for XML parsing. This is a validating parser also, so we could try using it to speed up validation.

      Nevertheless we should make sure as much is hoisted out of parse time as possible.

      Certainly we should try creating the validator object once; on the latest 2.0.0 currently there is code that does this once per thread (it does not assume the validator, when initialized, is thread safe - perhaps we can determine this and create only one, not one per thread.)

      Within the same thread the same validator will be used, but across threads it will not.

      It is initialized on first use, which probably shows up as part of parse time - involves reading the entire extended schema, resolving all file references, etc. Lots of cost here.

              Unassigned Unassigned
              mbeckerle.dfdl Mike Beckerle
              0 Vote for this issue
              1 Start watching this issue