Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added section on dfdl:string constructor

...

In truth, Daffodil approximates the DFDL Infoset using a subset of the features in the XML Infoset made visible via the JDOM libraries, and embellishing JDOM Elements with distinguished attributes.

Use of JDOM is motivated by the ability to plug JDOM into the Saxon-B XPath implementation as a way of realizing the DFDL expression language, which is a subset of XPath 2.0.

Ultimately, the Scala API for Daffodil converts the JDOM objects into Scala's native XML objects, e.g., scala.xml.Elem being the class of Element nodes.

Namespaces and Prefixes

The Daffodil implementation uses uses attributes in a few distinct namespaces to embellish JDOM Elements.

...

We also use the standard 'xsi' prefix/namespace, and 'xs' prefix/namespace.

Mapping of DFDL Infoset to Daffodil JDOM Infoset to Scala XML Nodes

DFDL InfosetDaffodil's JDOM XML InfosetScala scala.xml.Node Infoset
Document Information ItemJDOM DocumentThe document is represented by the root element. There is no separate document item.
rootgetRootElement()none
dfdlVersionattribute daffodil:dfdlVersion on the root element.none
schema (reserved for future use)(no implementation)none

unicodeByteOrderMark

attribute daffodildaf:unicodeByteOrderMark on the root element.same attribute scheme as JDOM
Element Information ItemJDOM Elementscala.xml.Elem
namespacegetNamespace(): org.jdom.Namespacedef namespace: String
namegetName(): Stringdef name: String
documentgetDocument()none (see parent)
datatype

attribute xsi:type with value one of the set of XML Schema simple type QNames that are in the DFDL Subset of XML Schema.

For example: xsi:type='xs:string'

By convention, the prefix 'xsi' and 'xs' denote here the usual standard namespace URIs.

same attribute scheme as JDOM
dataValue

For simple types other than xs:string, the cannonical XML representation of the value, as returned by getText().

However, for the value nil, the representation is an element with no value having the xsi:nil='true' attribute.

For type xs:string, the DFDL Infomrmation set Infoset allows representation of characters that are illegal in XML.

These are represented by replacing them with characters in the Unicode Private Use Area by a scheme described below.

def text: String to obtain cannonical text.

Nil representation is the same attribute scheme.

Values containing XML-illegal characters use the same scheme.

childrengetChildren()def child: Node*
parentgetParent()

none

Scala XML nodes are immutable, and do not have parent references.

This allows nodes to be shared.

schema

A special attribute dafi:schemaComponentID has a value which can be used to retrieve the associated schema component.

(Not yet implemented: means to create a standard Schema Component Designator  or SCD)

Same attribute scheme
valid(Not yet implemented)(not yet implemented)
unionMemberSchema(Not yet implemented)(not yet implemented)
"No Value"A JDOM Element with no children, and with no dataValue is the representation of an element with "No Value".A scala Elem with no children and no dataValue.
Augmented Infoset

A JDOM Element with a special marker attribute: dafi:hidden='true' signifies that the element is part of the augmented infoset.

This attribute is used to identify and filter out elements when the un-augmented infoset is needed.

Same attribute scheme, but on scala.xml.Elem element.

Implementation of DFDL Infoset Strings

...

It is a processing error if any DFDL infoset string character is created with a character code greater than #x10FFFF.

DFDL Expressions and Daffodil Infoset Strings

We use Saxon-B and JDOM so as to utilize the XPath implementation to realize DFDL expressions.

DFDL Infoset strings are accommodated by way of a function daf:string(...). This function takes a single argument of type string, and it interprets the DFDL numeric entities and DFDL character entities notations, and inserts the corresponding characters into the string result.

In addition, if the DFDL character entities identify XML-illegal characters, then the PUA-replacement described above is performed.

(Note 2012-12-05: this function is proposed to the DFDL Working Group for inclusion in DFDL version 1.0 standard, in which case it would used the standard prefix, i.e., dfdl:string(..) )

Daffodil Infoset and TDML Runner

The Daffodil TDML runner constructs the <tdml:dfdlInfoset> element contents by post-processing all strings so that the DFDL character entities notation can be used to express XML-illegal characters.

So for example:

     <tdml:dfdlinfoset><foo>abc%NULdfdlInfoset><foo>abc%NUL;</foo></tdml:infoset>dfdlInfoset>

would translate the %NUL; entity notation into character #x00, which is illegal in XML, and so it would be remapped to character #xE000. Hence, the above example is equivalent to writing:

     <tdml:dfdlinfoset><foo>abcdfdlInfoset><foo>abc&#xE000;</foo></tdml:infoset>dfdlInfoset>

which uses the XML numeric character entity to directly insert the remapped #xE000 character directly.  The use of DFDL character entities simply allows the notational convenience of the use of the symbolic form of these entities (NUL, CR, LF, HT, VT, FF, etc.), or the DFDL numeric entities form (for example "%#x02;") for notational consistency across DFDL schema and TDML test files.