You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Based on experience creating DFDL schemas and running them using Daffodil, there are a number of things not currently described in the DFDL v1.0 standards documents that would be helpful in modeling.

They are described here in no particular order, along with some discussion of them.

Recursion

One of the first things people want to model in DFDL always seems to be a binary legacy document formats like RTF or older MS Word documents. These have recursive structures where a section can contain text and other sections.

One advantage of DFDL without recursion is that it is not a 'turing complete' language. This is helpful from a security perspective. Adding recursion may break this boundary. Of course since DFDL has a rich regular expression capability, and its own backtracking, one can still create schemas that take absurdly long to execute even without recursion, so maybe this is not an issue.

Layering - Data Source/Target Indirection

Often one needs multiple passes. The value of some element, which might be a string, a hexBinary, or an array of bytes, wants to be used as the input for more parsing.

There are a whole bunch of related issues here. One also needs related functions in the expression language to allow concatenating a node list of strings into a single string (similarly for hexBinary and arrays of bytes).

The inverse of this layering for unparsing opens its own significant can of worms.

Complex Representations for Simple Types

XML Schema's simple vs. complex type distinction is quite painful. Often you want the logical result to be some computed element (using dfdl:inputValueCalc) of simple type, but one must have a hidden sequence group of several elements that are the more complicated representation details. In DFDL v1.0, one must of necessity model such a thing as a complex type, so that you have a place where both a hidden group and the 'value' element of simple type can live side by side.

A means is needed to embed a hidden group within the definition of a simple type, so that the hidden group is implicitly laid down next to the element having that simple type.

XML Attributes

The ability to have data of simple type become XML attribute values would go a long ways to making DFDL-created-XML more human-friendly.

More XML Schema Constructs

Several things in XML Schema seem to be missing. Unless there is a clear reason not to support them, it would be helpful. This list includes at least:

  1. repeating sequence and choice groups (minOccurs and maxOccurs)
  2. complex type derivations
  3. attributes (already mentioned above)

XML Schema 1.1

This new standard supports richer validation rules. They are useful since XML Schema 1.0's validation capabilities are so limited.

Delimited by Next Item

The ability to say that an element or group is delimited, but that it is delimited by the boundary of finding the initiator of the next element or group would simplify the description of many formats.

Character Class Entities

We badly need an entity that means 'any whitespace that is not a line ending'. This avoids the specification of separators like:

dfdl:separator="%SP; %SP;%SP; %SP;%SP;%SP; %SP;%SP;%SP;%SP; %SP;%SP;%SP;%SP;%SP;"

which is sometimes needed when %NL; is the terminator. The %WSP+; entity encompasses all whitespace.

 

  • No labels