Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: clarifications

...


The DFDL spec has added, as an errata to draft v1.0.3, a behavior that is effectively a warning mechanism, called a 'recoverable error'.

So we have these error/warning types: We will generally use the term "error" to mean "error or warning or information item".

  • Schema Definition Error (a.k.a., SDE) - detected at compile time
  • Schema Definition Error - detected at run time
  • Schema Definition Warning - detected at compile time - not called out explicitly in the spec, though there are many places it says implementations may want to warn....
  • Schema Definition Warning - detected at run time
  • Processing Error (a.k.a., PE)  - always a run-time thing - causes backtracking when the schema expresses alternatives
  • Recoverable Error - always a run-time thing - never causes backtracking, really this is just a run-time "warning".  We may want to have a threshold that determines how many of these are allowed before it escalates it to either a Processing error (which can cause backtracking), or a fatal error.
  • Information - either at compile or run time, we may want to simply inform the user. Probably this is under control of some flags to control whether one wants these or not. Example: an information object might inform the user that the format is not efficiently streamable due to forward/backward reference issues, such as a header record that contains the length of the entire data object. One cannot stream this when unparsing, as one must hold-back the header until the full length is known. In some cases users may want to escalate these information items to warnings or errors, such as if it is their intention to stream the data, then they may want an error from schema compilation for non-streamable formats.

In all cases we need to capture information about the schema location(s) relevant to understanding the cause of the error, and in the case of errors/warnings at run-time, the data location(s) that are relevant. A schema location is a schema file name, and a schema component within that schema file. Any given issue may require more than one schema location to be provided as part of its diagnostic information.

...

At compile time, we want to gather a set of SDEs to issue to the user. So compilation wants to continue to process more of the schema where possible, gather together the results, and then return that full set of diagnostics from all the compilation results.

Given a schema with many top-level elements, we can easily just compile each of the top-level elements regardless of any errors on the other top-level elements, and then present the complete set of errors to the user.  However, it may not be desirable to treat every top-level element as a potential root; this may simply produce many irrelevant errors. A This is not desirable, however, because a large schema with many top level element declarations may still have only one that is intended to be the document root.

  • The API for compilation should allow compilation of one or more global element declarations as the potential document root(s).

It is harder to gather a set of diagnostics from the compilation of a single root element, rather than stop on the first such issue. The thing to depend on is this:

  • Compilation of each of the children of a sequence or choice can be isolated, and the errors from those compilations concatenated (in schema order) to form the set for the whole compilation.

Tracing/Logging

Applications that embed Daffodil are very likely to be servers, so a target logger to which Daffodil writes out logging/tracing needs to be something that the application can provide to Daffodil via an API.

In the case of an interactive DFDL Schema authoring environment, trace information would normally be displayed to the user/author. A runtime server that embeds Daffodil would more likely want to log to a file-system-based logger, and possibly trigger alerts flowing to a monitoring system of some kind.

Tracing and logging overlap, in the sense that tracing may need to be activated on a pure-runtime embedded system for diagnostic purposes, in which case trace output becomes just a specialized kind of log output. An example of this would be when a DFDL schema author believes the schema is correct, but when deployed at runtime inside some server, data arrives that contains things unanticipated by the schema author. The resulting failure to parse may result in wanting to turn on debug/trace features within the server's Daffodil runtime.

Purposes of tracing and logging include:

  1. helping Daffodil developers find and isolate bugs in the Daffodil code base.
  2. helping DFDL schema authors write correct schemas by tracing/logging compiler behavior. These traces/logs can be about identifying problems, or simply building confidence that a schema is correct. In the latter case, the trace/log is effectively useful redundant information.
  3. helping DFDL schema authors write correct schemas by tracing/logging runtime behavior.
  4. helping DFDL processor users (who are running applications that embed Daffodil) identify problems in either the data or schemas that purport to describe that data.
  5. helping Daffodil developers find and isolate performance problems in the Daffodil code base.
  6. helping DFDL schema authors understand the performance of Daffodil when processing data for their DFDL Schema. (When there is more than one way to model data in DFDL, sometimes DFDL Schemas can be tuned to improve performance by choosing alternative modeling techniques.)

Purposes of logging include all of the above, but also include:

  1. monitoring (over extended time periods) performance of compilation
  2. monitoring (over extended time periods) performance of runtime behavior
  3. generating alerts that flow to an overarching systems monitoring environment

Purpose (1) here is much like Assertion checking. it wants to be something that is low/zero overhead if turned off, but can be turned on without recompilation of the application, and preferably without even restarting the application. Traces  Keep in mind however, that traces should not be used as a substitute for breakpoint debugging.

Purpose (2) is an end-user feature likely to be turned on/off as part of some tooling/environment used by a DFDL schema author. The Daffodil processor must provide APIs for controlling this from the tooling. In a functional program like Daffodil, these kinds of traces/logs are not really things that print to an error stream so much as they are additional attributes (lazy vals) computed for purposes of illustrating the decisions made by the compiler.

...