Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: minor fixes to typos, tiny clarifications

Daffodil is an implementation of DFDL which uses JDOM and XML Scala's scala.xml.Elem or JDOM  to represent the DFDL Infoset in XML.

The DFDL Infoset is somewhat different from the XML Infoset.

In truth, Daffodil approximates the DFDL Infoset using a subset of the features in the XML Infoset made visible via the JDOM libraries, and embellishing JDOM Elements with distinguished attributes.Use of JDOM is motivated by the ability to plug JDOM into the Saxon-B XPath implementation as a way of realizing the DFDL expression language, which is a subset of XPath 2.0by embellishing elements with distinguished attributes.

Ultimately, the Scala API for Daffodil converts the JDOM objects into to/from Scala's native XML objects, e.g., scala.xml.Elem being the class of Element nodes.

The Java API returns the converts to/from JDOM objects directly to the caller.

Namespaces and Prefixes

The Daffodil implementation uses uses attributes in a few distinct namespaces to embellish JDOM XML Elements.

The string "urn:ogf:dfdl:2013:imp:opensource.ncsa.illinois.edu:2012" is the daffodil implementation namespace prefix. All Daffodil-specific namespaces extend this.

...

We also use the standard 'xsi' prefix/namespace, and 'xs' prefix/namespace.

Mapping of DFDL Infoset to Daffodil JDOM Infoset and to Scala XML Nodes

DFDL InfosetDaffodil's JDOM XML InfosetScala scala.xml.Node Infoset
Document Information ItemJDOM DocumentThe document is represented by the root element. There is no separate document item.
rootgetRootElement()none
dfdlVersion

attribute

daffodil

daf:dfdlVersion on the root element.

(Not yet implemented)

none
schema (reserved for future use)

daf:schema attribute

(

no

No implementation)

none

unicodeByteOrderMark

attribute daf:unicodeByteOrderMark on the root element.

(Not yet implemented)

same attribute scheme as JDOM
Element Information ItemJDOM Elementscala.xml.Elem
namespacegetNamespace(): org.jdom.Namespacedef namespace: String
namegetName(): Stringdef name: String
documentgetDocument()none (see parent)
datatype

attribute xsi:type with value one of the set of XML Schema simple type QNames that are in the DFDL Subset subset of XML Schema.

For example: xsi:type='xs:string'

By convention, the prefix 'xsi' and 'xs' denote here the usual standard namespace URIs.

(NOTE: datatype is not Not yet implemented)

same attribute scheme as JDOM
dataValue

For simple types other than xs:string, the cannonical XML representation of the value, as returned by getText().

For type xs:string, the DFDL Infoset allows representation of characters that are illegal in XML.

These are represented by replacing them with characters in the Unicode Private Use Area by a scheme described below.

def text: String to obtain cannonical canonical text.

Values containing XML-illegal characters use the same scheme.

nilledxsi:nil='true' attribute on element. Absence of this attribute implies 'false'Same attribute xsi:nil
childrengetChildren()def child: Node*
parentgetParent()

none

Scala XML nodes are immutable, and do not have parent references.

This allows nodes to be shared.

schema

A special attribute dafidaf:schemaComponentID has a value which can be used to retrieve the associated schema component.

(Not yet implemented. Note: requires a means to create a standard Schema Component Designator  or SCD)

Same attribute scheme
valid

daf:valid='true' means the data has been tested and is valid, daf:valid="false" means the data has been tested and is invalid. The absence of the attribute means that no position is taken on the validity of the data.

(Not yet implemented)

(not yet implemented)
Same attribute scheme
unionMemberSchema(Not yet implemented)(not yet implemented)
"No Value"A JDOM Element with no children , and with no dataValue (not even Text node children)  is the representation of an element with "No Value".A scala.xml.Elem with no children and no dataValue.
Augmented Infoset

A JDOM Element with a special marker attribute: dafint:hidden='true' signifies that the element is part of the augmented infoset.

This attribute is used to identify and filter out elements when the un-augmented infoset is needed.

Same attribute scheme, but on scala.xml.Elem element.

...

  • The ability to specify an encoding.
    • For example, a MS-Windows user may wish to specify the windows-1252 encoding.
    • The minimum set of supported encodings would be ASCII, windows-1252, and UTF-8
      • Specifying UTF-8 turns off the numeric character entity substitution part of this special transformation.
  • Any unicode codepoint which cannot be mapped to the selected encoding can be replaced by its XML numeric character entity equivalent.
    • Example: If the user specifies the US-ASCII encoding, there is no mapping for the Euro symbol €, which is Unicode #x20AC. This would be output as €
    • Example: If the user specifies windows-1252 encoding, the PUA-mapped characters for the XML-illegal code points such as codepoints 0 to 8, become #xE000 to #xE008 in the Daffodil Infoset according to the PUA mapping described above, and would become  to  in the output text.
  • An option allows the user to control whether an XML heading line such as <?xml version="1.0" encoding="windows-1252" ?> is generated at the start of the textual output.

...

Note that choice of the ASCII or US-ASCII encoding creates an output that is universal, in that it would have only the ASCII 7-bit characters in use yet would be able to represent any character allowed in XML accurately. This form however, would be largely unreadable not only to users of oriental language scripts, but even to users of commonplace accented forms from european language scripts.

...

An additional option controls escaping of the special XML characters, <,>,&,", and '. 

CDATA preference: When it is expected that string data contains, or could contain one or more of the characters &, ", ', <, and >, then the user can specify an option for whether they prefer use of CDATA sections, or standard escaping where the standard character entities are used: &amp; &quot; &apos; &lt; &gt;.

...