Error/Diagnostics and Tracing/Logging for Daffodil

Design notes and Requirements Analysis

Compilation

Sometimes we return

a value
a set of errors/diagnostics/warnings
both

This suggests that Scala's convention of using an Either object is no good for us, because its very name implies one or the other, not both. Both will certainly happen if there are only compile-time warnings.

...

this

...

page

...

Example: an xpath expression is compiled into a CompiledExpression type object. This object type should contain a boolean member named isError which is false if compilation of the xpath succeeded. A member 'getDiagnostics' will be Nil if there are no errors/diagnostics/warnings/info, otherwise will contain a Set of error/diagnostic/warning/info objects. If isError is false, then any error/diagnostic/warning objects are not ones that prevent the CompiledExpression from being used (i.e., probably all warnings/info objects)

In general, this is the idiom for any sort of compilation step. So instead of a compilation action returning an Option[T] where None indicates failure, we get type T, and isError will be true or false.

Runtime

...

we will want to do lazy message construction.

...

In general, however, backtracking in the runtime isn't exceptional behavior that happens occasionally. It is the principle means by which the parser deals with variability of data formats. Hence, we try to avoid heavyweight constructs like try-catch logic in too many places in the runtime. This means that in the runtime, exceptions/errors aren't generally thrown, but instead a failed return status is returned.

This is not only about performance, it is also about an API guarantee that must be provided by the parse() routine of the API DataProcessor object, which is that it returns a success or failure, and has diagnostics objects associated with it.

Error Types

...

Schema Definition Error (a.k.a., SDE) - detected at compile time
Schema Definition Error - detected at run time
Schema Definition Warning - detected at compile time - not called out explicitly in the spec, though there are many places it says implementations may want to warn....
Schema Definition Warning - detected at run time
Processing Error (a.k.a., PE) - always a run-time thing - causes backtracking when the schema expresses alternatives
Recoverable Error - always a run-time thing - never causes backtracking, really this is just a run-time "warning". We may want to have a threshold that determines how many of these are allowed before it escalates it to either a Processing error (which can cause backtracking), or a fatal error.
Information - either at compile or run time, we may want to simply inform the user. Probably this is under control of some flags to control whether one wants these or not. Example: an information object might inform the user that the format is not efficiently streamable due to forward/backward reference issues, such as a header record that contains the length of the entire data object. One cannot stream this when unparsing, as one must hold-back the header until the full length is known. In some cases users may want to escalate these information items to warnings or errors, such as if it is their intention to stream the data, then they may want an error from schema compilation for non-streamable formats.
Tracing - a DFDL user may want to watch a parse as it happens in order to decipher why their data doesn't match the format. Shy of a full-blown interactive data format debugging tool, just getting a trace of the parse behavior is a powerful step in this direction.

In all cases we need to capture information about the schema location(s) relevant to understanding the cause of the error, and in the case of errors/warnings at run-time, the data location(s) that are relevant. A schema location is a schema file name, and a schema component within that schema file. Any given issue may require more than one schema location to be provided as part of its diagnostic information.

SchemaComponent(s) will contain their schema location.
Every sub-structure within a schema component will contain its schema location.

At runtime, a data location is either

an offset (in bits, bytes, characters, or combination thereof) from the beginning of the data stream
a relative offset from another data location (recursively, this bottoms out at the beginning of the data stream)

Any data location can always be converted into an absolute location in the data stream, but relative offsets to other locations in the data (e.g., the beginning of the current record) are often more useful for diagnostics.

Continuing Execution After Fatal Error

...

Recovery from a runtime error involves a point of uncertainty expressed in the schema. Consider: the root element for the parse could be placed as an element reference inside a choice as the first alternative. The second alternative could be one hexBinary type element extending with lengthKind='endOfParent', meaning to the end of the data stream in this case. This second alternative will always succeed to parse, so provides a natural way to move forward if parsing based on the schema and its root element ultimately fails.

However, this is not quite enough, as the user is likely to want to route this hexBinary data blob somewhere, and capture the diagnostic information from the parse failure to keep with it. The normal behavior of a choice where a second alternative succeeds would be to discard error/diagnostic information from any prior failing alternative. Hence, a top-level API is needed which provides access to the failure diagnostic information for the overall parse. Consistent with discussion herein about lazy message construction, the blob of binary data needed for a diagnostic should really be an offset and length into some data stream. The data blob itself can be large, but we should only copy it if this is required.

DFDL API methods for obtaining diagnostic information.

Gathering Multiple Compile-Time Errors

At compile time, we want to gather a set of SDEs to issue to the user. So compilation wants to continue to process more of the schema where possible, gather together the results, and then return that full set of diagnostics from all the compilation results.

Given a schema with many top-level elements, we can easily just compile each of the top-level elements regardless of any errors on the other top-level elements, and then present the complete set of errors to the user. This is not desirable, however, because a large schema with many top level element declarations may still have only one that is intended to be the document root.

The API for compilation should allow control of whether to compile all, or just a subset of the global element declarations.

It is harder to gather a set of diagnostics from the compilation of a single root element, rather than stop on the first such issue. The thing to depend on is this:

Compilation of each of the children of a sequence or choice can be isolated, and the errors from those compilations concatenated (in schema order) to form the set for the whole compilation.
There may be duplicate errors. E.g., all subsequent children of a choice may be tripped up by the same error in the first child of the choice. Duplicates can/should be suppressed.

Tracing/Logging

Applications that embed Daffodil are very likely to be servers, so a target logger to which Daffodil writes out logging/tracing needs to be something that the application can provide to Daffodil via an API like a setLogger() function.

In the case of an interactive DFDL Schema authoring environment, trace information would normally be displayed to the user/author. A runtime server that embeds Daffodil would more likely want to log to a file-system-based logger, and possibly trigger alerts flowing to a monitoring system of some kind.

Tracing and logging overlap, in the sense that tracing may need to be activated on a pure-runtime embedded system for diagnostic purposes, in which case trace output becomes just a specialized kind of log output. An example of this would be when a DFDL schema author believes the schema is correct, but when deployed at runtime inside some server, data arrives that contains things unanticipated by the schema author. The resulting failure to parse may result in wanting to turn on debug/trace features within the server's Daffodil runtime.

Purposes of tracing include:

helping Daffodil developers find and isolate bugs in the Daffodil code base.
helping DFDL schema authors write correct schemas by tracing/logging compiler behavior. These traces/logs can be about identifying problems, or simply building confidence that a schema is correct. In the latter case, the trace/log is effectively useful redundant information.
helping DFDL schema authors write correct schemas by tracing/logging runtime behavior.
helping DFDL processor users (who are running applications that embed Daffodil, but are not the authors of the DFDL schemas) identify problems in either the data or the schemas that purport to describe that data.
helping Daffodil developers find and isolate performance problems in the Daffodil code base.
helping DFDL schema authors understand the performance of Daffodil when processing data for their DFDL Schema. (When there is more than one way to model data in DFDL, sometimes DFDL Schemas can be tuned to improve performance by choosing alternative modeling techniques.)

Purposes of logging include all of the above, but also include:

monitoring (over extended time periods) performance of compilation
monitoring (over extended time periods) performance of runtime behavior
generating alerts that flow to an overarching systems monitoring environment

...

turn on/off this tracing, without having to restart, and similarly control the verbosity of detail in the traces/log, and control any selectivity features of the tracing.
supply the streams to which the trace/logs are written. These may or may not be streams leading to a file system.
avoid full-disk situations by being notified about the volume of data written to the streams and being able to change the streams without loss of any traces/log records.
run forever with tracing/logging turned on, albeit with some performance degradation proportional to the amount of trace/log information being generated.

Coding Style Requirements

...

has moved to https://cwiki.apache.org/confluence/display/DAFFODIL/Error%2C+Diagnostics%2C+Tracing%2C+Logging

Space shortcuts

Child pages

Versions Compared

Old Version 10

New Version Current

Key

Error/Diagnostics and Tracing/Logging for Daffodil

Compilation

this

page

Runtime

Error Types

Continuing Execution After Fatal Error

Gathering Multiple Compile-Time Errors

Tracing/Logging

Coding Style Requirements

has moved to https://cwiki.apache.org/confluence/display/DAFFODIL/Error%2C+Diagnostics%2C+Tracing%2C+Logging

Space shortcuts

Child pages

Page History

Versions Compared

Old Version 10

New Version Current

Key

Error/Diagnostics and Tracing/Logging for Daffodil

Compilation

this

page

Runtime

Error Types

Continuing Execution After Fatal Error

Gathering Multiple Compile-Time Errors

Tracing/Logging

Coding Style Requirements

has moved to https://cwiki.apache.org/confluence/display/DAFFODIL/Error%2C+Diagnostics%2C+Tracing%2C+Logging