Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Daffodil is the first DFDL implementation to provide the advanced features of

  • dfdl:inputValueCalc (IVC for short)
  • dfdl:outputValueCalc (OVC for short)
  • dfdl:hiddenGroupRef

...

this

...

These features are essential for highly complex binary formats like:

  • PCAP
  • Asterix
  • MIL-STD-2045 and related MIL-STD and STANAGs

...

page

...

Infoset can contain elements that have dfdl:outputValueCalc

It is essential that when unparsing it is allowed for the infoset to already contain output-computed elements. The values of these will be discarded and recomputed during unparsing, but they have to be tolerated if present, and created if not present.

SDE for element in hidden group that has no default, nor dfdl:outputValueCalc

Within a hidden group, if an element has no fixed/default value, and no OVC, then because it is hidden and so does not appear in the infoset, there's no way for it to get a value at all when unparsing. 

Daffodil issues an SDE in this situation as is required by the DFDL spec.

This SDE is too strong: If the element has IVC, then it doesn't need a value for unparsing unless an expression that is used at unparse time (i.e., not in an assert or discriminator) references the element. In many cases the value of an IVC may be used only in dfdl:occursCount expressions, or in assert/discriminator test expressions. Since those aren't evaluated when unparsing, there is no unparsing situation would never need the value, so the SDE is too strong.

IVC and OVC on the same element

In many situations we have found that we need both IVC and OVC on the same element. This occurs when the element is in a hidden group, and is essentially being used as a variable.

The dfdl:newVariableInstance doesn't eliminate this problem, as there are situations where one needs an "array variable", that is a variable associated with each index of an array, but referenced from expressions used to compute things outside of the scope of that array element.

TBD: is OVC truly needed here, or is this a result of the SDE above for elements that have no way to get a value at unparse time - but the element turns out to be unused at unparse time.

Array Variables and dfdl:newVariableInstance

In many cases dfdl:newVariableInstance is not sufficient because what are needed are array variables.

.e., when parsing element i+1 you have to refer to a "variable" defined as part of element i.

You can't do that with dfdl:newVariableInstance because of the scope it has. If you introduce a dfdl:newVariableInstance for an array element, it's scope will be that element only. If you introduce it outside the array element there will be only one instance for the whole array.

But you can do an "array variable" with a hidden element. (Doesn't have to be hidden, but that's the sensible way to use it.)

The headache being the relative path to it., and then the fact that the most natural thing to do is to put both IVC and OVC on it (though the OVC may not be needed in many cases).

Forward reference from dfdl:setVariable and dfdl:newVariableInstance Expressions

Consider unparsing with dfdl:setVariable expression referring to OVC forward-referencing element.

Code Block
<xs:element name="len" type="xs:int" dfdl:outputValueCalc="{ dfdl:valueLength(../data) + 10 }"/>

<xs:sequence>
  <xs:annotation><xs:appinfo ...>
    <dfdl:setVariable name="var">{ ../len }" />
  </xs:appinfo></xs:annotation>

  <xs:element name="data" dfdl:length="{ $var }">
    ....
  </xs:element>
....
</xs:sequence>

In the above, when unparsing, the output value calc for len can be evaluated, but we must delay its evaluation and unparsing until the subsequent data element is available. 

The next thing the unparse has to do, after delaying the unparsing of 'len' is set the var variable. This requires the value of len, which has been deferred.

We have real schemas (e.g., even PCAP) where this occurs.

This breaks the rule that when variables are evaluated, things they reference must have already been evaluated. Basically, when we delay evaluating the OVC for len we are suspending anything that depends on len having a value as well.

fn:error(...) Function needed to issue errors when unparsing

When parsing, one has the alternative of creating a dfdl:assert that tests something and fails if it doesn't hold.

However, dfdl:assert expressions aren't evaluated when unparsing; hence, a way to indicate an error in data during unparse-time expression evaluation is needed.

The XPath fn:error(...) function is suitable.

Parse-Time Forward Reference

Several standards in the MIL-STD and NATO STANAG space express formats using a forward reference idiom that requires those forward references to be used at parse time.

These standards are not publicly available. They are US For-official-use-only or NATO Unclassified (which is controlled, not public), but below illustrates roughly the idiom.

This is pseudo-DFDL - because the expressions are forward referencing:

Code Block
<choice dfdl:choiceDispatchKey="{ ../key }">
  <sequence dfdl:choiceBranchKey="A">
    <element name="e1" type="xs:int"/>
    <element name="e2" type="xs:int"/>
  </sequence>
  <sequence dfdl:choiceBranchKey="B">
    <element name="e2" type="xs:long"/>
  </sequence>
</choice>
....
<element name="key" type="xs:string" dfdl:length="1"/>
....

In the above you see that the key comes after the choice. When parsing this means the parse must suspend, move past the uncertain region (the size of which has to be determined without parsing it, meaning all branches of the choice must be the same fixed predictable length), continue parsing after the uncertain region until the elements have been parsed which allow the choiceDispatchKey expression to be evaluated. At that point the parsing of the choice can be resumed, (whatever format properties were in effect at the start of the choice must be restored while it is parsed) and the result of the parse added to the infoset. Any streaming of the parse-output infoset is of course suspended until the choice can be resolved.

The workaround for this is to include everything after the choice, up to and including the key, into every branch, and then have every branch end with a discriminator on the key. This would work and enable the data to parse and unparse. It is undesirable as it is not an efficient implementation, is redundant, and is much less declarative.

Of note: the distance (amount of schema) between the choice and the key element(s) that enable the choice to be determined, is not large in any of the examples we have seen to date. It is at most several elements, and is a fixed distance from the beginning of the choice as well.

We expect this forward-reference feature evolved out of a reasonable pattern of behavior for the way a successful data format evolves. A data record layout existed. The format is mostly fixed-length required fields. That layout cannot be modified. However, it needed to be extended to accommodate more fields, and those are added onto the end of the data layout. However, there was an opportunity to save space by reusing some preceding areas of the record layout that would otherwise be unused in certain new uses of the now-extended data record.  The flag to indicate this reuse is of course a new field, and must be added on the end of the record. Introducing a choice earlier in the pre-existing part of the data record layout is allowed. It doesn't change anything about the existing layout, as the first branch of the choice would contain the pre-existing layout. Other branches would contain the re-purposing of that same data area for the new extended purposes. All this naturally leads to a choice (for the reused parts of the pre-existing data record layout) where the flag appears after the choice at the end of the record.

In some of the formats where this forward-ref behavior appears, the data records have been extended in this manner more than once, so a single data record has more than one such choice with forward reference.

Restricting IVC/OVC in Unordered Sequences

TBD: implementation of unordered sequences is raising issues here. Just restricting this usage may be the right choices.

...

has moved to https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=75966200