Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Minor editing/clarification

...

  • Schema Definition Error (a.k.a., SDE) - detected at compile time
  • Schema Definition Error - detected at run time
  • Schema Definition Warning - detected at compile time - not called out explicitly in the spec, though there are many places it says implementations may want to warn....
  • Schema Definition Warning - detected at run time
  • Processing Error (a.k.a., PE)  - always a run-time thing - causes backtracking when the schema expresses alternatives
  • Recoverable Error - always a run-time thing - never causes backtracking, really this is just a run-time "warning".  We may want to have a threshold that determines how many of these are allowed before it escalates it to either a Processing error (which can cause backtracking), or a fatal error.
  • Information - either at compile or run time, we may want to simply inform the user. Probably this is under control of some flags to control whether one wants these or not. Example: an information object might inform the user that the format is not efficiently streamable due to forward/backward reference issues, such as a header record that contains the length of the entire data object. One cannot stream this when unparsing, as one must hold-back the header until the full length is known. In some cases users may want to escalate these information items to warnings or errors, such as if it is their intention to stream the data, then they may want an error from schema compilation for non-streamable formats.
  • Tracing - a DFDL user may want to watch a parse as it happens in order to decipher why their data doesn't match the format. Shy of a full-blown interactive data format debugging tool, just getting a trace of the parse behavior is a powerful step in this direction.

In all cases we need to capture information about the schema location(s) relevant to understanding the cause of the error, and in the case of errors/warnings at run-time, the data location(s) that are relevant. A schema location is a schema file name, and a schema component within that schema file. Any given issue may require more than one schema location to be provided as part of its diagnostic information.

...

However, this is not quite enough, as the user is likely to want to route this hexBinary data blob somewhere, and capture the diagnostic information from the parse failure to keep with it. The normal behavior of a choice where a second alternative succeeds would be to discard error/diagnostic information from any prior failing alternative. Hence, a top-level API is needed which provides access to the failure diagnostic information for the overall parse. Consistent with discussion herein about lazy message construction, the blob of binary data needed for a diagnostic should really be an offset and length into some data stream. The data blob itself can be large, but we should only copy it if this is required.

  • DFDL API methods for obtaining diagnostic information.

Gathering Multiple Compile-Time Errors

...

  • The API for compilation should allow compilation of one or more global element declarations as the potential document root(s)control of whether to compile all, or just a subset of the global element declarations.

It is harder to gather a set of diagnostics from the compilation of a single root element, rather than stop on the first such issue. The thing to depend on is this:

  • Compilation of each of the children of a sequence or choice can be isolated, and the errors from those compilations concatenated (in schema order) to form the set for the whole compilation.
  • There may be duplicate errors. E.g., all subsequent children of a choice may be tripped up by the same error in the first child of the choice. Duplicates can/should be suppressed.

Tracing/Logging

Applications that embed Daffodil are very likely to be servers, so a target logger to which Daffodil writes out logging/tracing needs to be something that the application can provide to Daffodil via an API like a setLogger() function.

...

  1. helping Daffodil developers find and isolate bugs in the Daffodil code base.
  2. helping DFDL schema authors write correct schemas by tracing/logging compiler behavior. These traces/logs can be about identifying problems, or simply building confidence that a schema is correct. In the latter case, the trace/log is effectively useful redundant information.
  3. helping DFDL schema authors write correct schemas by tracing/logging runtime behavior.
  4. helping DFDL processor users (who are running applications that embed Daffodil, but are not the authors of the DFDL schemas) identify problems in either the data or the schemas that purport to describe that data.
  5. helping Daffodil developers find and isolate performance problems in the Daffodil code base.
  6. helping DFDL schema authors understand the performance of Daffodil when processing data for their DFDL Schema. (When there is more than one way to model data in DFDL, sometimes DFDL Schemas can be tuned to improve performance by choosing alternative modeling techniques.)

...