Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

For unparsing, this term means the Infoset without any hidden elements created, and without any dfdl:outputValueCalc elements computed. Furthermore, elements have not been padded/filled or truncated to their specified length. Also, there may be required elements that are missing and these need to be created using default values.

Current and Future Infoset

...

Perhaps the most complex issue for creating a DFDL Infoset, or for unparsing one, is determining which DFDL schema component corresponds to a particular Infoset element. The same problem occurs if a relatively naive program is constructing the DFDL Infoset using the API. When a Infoset element is being created, one must identify the DFDL schema component that corresponds to it. The context for this is the enclosing parent element, and any prior sibling elements. Unfortunately, one also needs some following elements in some cases.

Archives of the DFDL Workgroup email contain a number of discussions about xs:choice and inferring the right choice-arm given an unassociated Infoset Element.

The unparser must look ahead by one subsequent element in these situations:

  1. to determine which alternative arm of a choice a particular element is in
  2. to determine if an element instance is the last element of an array

The algorithm for (1) is described in the DFDL Specification document (Section 15.1.3) as:

On unparsing there is the question of how one identifies the appropriate schema choice branch corresponding to the data in the infoset. This is complicated by the fact that the children may not be elements. They may themselves be sequences or choices.The selection of the choice branch is as follows: The element in the infoset is used to search the choice branches in the schema, in schema definition order, but without looking inside any complex elements. If the element occurs in a branch, then that branch is selected and if subsequently a processing error occurs, this selection is not revisited (that is, there is no backtracking).

To avoid any unintended behavior, all the children of a choice can be modeled as elements.


Daffodil pre-computes for each branch of a choice, the element name and namespace that distinguish that branch.  This is always unambiguous thanks to XML Schema's UPA rules. (It is a Schema Definition Error if there is ambiguity here.)  Hence, when the unparser encounters an element in the infoset that corresponds to a choice in the DFDL schema, the name and namespace comparison is quick and does not involve the search process described in the DFDL specification - that analysis is done at the time of schema compilation.

The Unparser corresponding to a choice contains this name and namespace lookup table (in the unparser runtime data structure), and recursively invokes the appropriate sub-unparser.

The algorithm for (2) is relatively simple. For some values of the dfdl:occursCountKind property, the array unparser must count, and when the maximum number of element occurences has been unparsed, it then ends the repeating/array unparser (successfully - there is no backtracking in unparsing) and begins populating the DFDL infoset with whatever is next after the array... TBD discussion here ....

Inferring Arrays

This is a sub-issue of determining the DFDL schema component. Specifically it is the issue of determining when <foo>5</foo><foo>6</foo> are two elements of the same array, or two separate elements either scalar or of different arrays. DFDL and XML Schema specifically allow for schemas like this:

...

  • The Unparser's state is class UState. Unlike the early versions of the Parser & PState, the Unparser from the start mutates the UState rather than doing the "functional programming" kind of thing - copying it with changes. The unparse methods do not return a UState object. They modify the one that is passed in (which enforces this contract). Each thread must have its own UState.
  • The Unparser has no limitations on data sizes. This problem is fundamentally easier to solve for unparsing than it is for parsing. Data buffering may still be needed (see discussion of Pending Calculations).
  • The grammar rules part of the middle of Daffodil has some universal productions - they apply whether parsing or unparsing, but some grammar productions are parser or unparser specific. This is done with guards on the productions that specify whether the rule applies only to parsing, to unparsing, or both. This implies that there are Terminal objects that are parser or unparser specific, which is to say they have an implementation of only the parser() method or the unparser() method.
  • Required elements that are missing from the infoset must be added.

Incremental Unparsing - Pending Calculations and Forward Reference

...