Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

this page has moved to https://cwiki.apache.org/confluence/display/DAFFODIL/Proposal%3A+Features+to+Support+Table-Lookup

...

Proposed: A Mechanism to define hash and range tables and do constant time lookups on them at runtime via DFDL Expression function calls.

The mechanism is based on the principles of the DFDL Infoset which has [schema] and [unionMemberSchema] members for infoset elements. These members are Schema Component Identifiers (SCID). Given access to these, one can in principle obtain the name of the union member type (if it is named), or other characteristics of a schema component such as its attributes.

One can also, given a simple type defined as a union, one can ask for characteristics of a specific named member of that union.

What is proposed is a logical extension of the above concepts to define lookup tables from properties on union simple types. It provides DFDL expression functions to access those lookup tables via infoset elements.

Consider:

Code Block
<xs:simpleType name="STATUS_UP">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="1"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="STATUS_DOWN">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="2"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="STATUS_DEGRADED">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="3"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="NO_STATEMENT">
  <xs:union>
     <xs:simpleType>
       <xs:restriction base="xs:unsignedLong">
        <xs:minInclusive value="4"/>
        <xs:maxInclusive value="63"/>
       </xs:restriction>
     </xs:simpleType>
     <xs:simpleType>
       <xs:restriction base="xs:unsignedLong">
          <xs:enumeration value="0"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:union>
</xs:simpleTYpe>
 
<xs:simpleType name="Status" daf:unionMemberLookup="yes" dfdl:length="6">
  <xs:union memberTypes="tns:NO_STATEMENT tns:STATUS_UP tns:STATUS_DOWN tns:STATUS_DEGRADED"/>
</xs:simpleType>
 
<xs:element name="rawStatus" type="tns:Status"/><xs:simpleType name="STATUS_UP">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="1"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="STATUS_DOWN">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="2"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="STATUS_DEGRADED">
  <xs:restriction base="xs:unsignedLong">
    <xs:enumeration value="3"/>
  </xs:restriction>
</xs:simpleTYpe>
 
<xs:simpleType name="NO_STATEMENT">
  <xs:union>
     <xs:simpleType>
       <xs:restriction base="xs:unsignedLong">
        <xs:minInclusive value="4"/>
        <xs:maxInclusive value="63"/>
       </xs:restriction>
     </xs:simpleType>
     <xs:simpleType>
       <xs:restriction base="xs:unsignedLong">
          <xs:enumeration value="0"/>
       </xs:restriction>
     </xs:simpleType>
   </xs:union>
</xs:simpleTYpe>
 
 
<xs:simpleType name="Status" daf:unionMemberLookup="yes" dfdl:length="6">
  <xs:union memberTypes="tns:NO_STATEMENT tns:STATUS_UP tns:STATUS_DOWN tns:STATUS_DEGRADED"/>
</xs:simpleType>
 
<xs:element name="rawStatus" type="tns:Status"/>

Usage for Parsing - daf:unionMemberSchemaName(arg)

Imagine the value of the rawStatus element is 2.

So we can compute a logical value that is a string from the rawStatus value as follows:

Code Block
<element name="value" type="xs:string" dfdl:inputValueCalc="{ daf:unionMemberSchemaName(../rawStatus) }"/>

In this case value would be 'STATUS_DOWN'.

  • SDE if arg is not simple type that is a union, and all union members are not named.
  • SDE if union simple type does not have daf:unionMemberLookup="yes" defined on it.
  • PE if the argument is not a value mapped to a union member.

Due to name collisions issues, a property dfdl:unionMemberName, can be supplied on a simple type union member, and that name is preferred to the type's name. This enables use of anonymous simple type definitions in the union, as well as eliminating problems from XSD simple type name collisions within the same namespace.

Usage for Unparsing - daf:unionMemberValueFromName(arg)

Imagine our peer element named value contains "NO_STATEMENT" as a string in the infoset before unparsing begins.

Our rawStatus element can be revised so that its value is computed when unparsing.

Code Block
<element name="rawStatus" type="tns:Status" dfdl:outputValueCalc="{ daf:unionMemberValueFromName(../value) }"/>

The arg is type xsd:NCName. Returns 1st facet value corresponding to union member of that name. Type of the element must be a union.

So in this case rawStatus would be 4, since the first facet value expressed in the NO_STATEMENT type (which is itself a union), is 4. This

is not the numerically smallest facet value. It is the one expressed first. This allows use of lexical ordering to specify the representative value to be used in a range when unparsing.

  • SDE if element type is not a union type with all named member types.
  • SDE if arg is not path to an element of type string (or derived from string)
  • SDE if element type is not a union type having daf:unionMemberLookup="yes"
  • PE if arg is to a string that is not one of the union member names.

When simple types are composed by restriction, the derived type's facet constraints are considered earlier than the base type's facet constraints.

When simple types are composed by union, the lexically first member of the union is earlier than later ones.

If a simple type union is a union of unions, then only the top-level types are distinguished by these lookup functions. The tree of nested unions is NOT flattened.

Any other facets such as pattern may not appear on any of the union types. They are disallowed to enable possible future extension. It is an SDE if they are present.

Implementation Considerations

The implementation must take the definition of each simple type union having daf:unionMemberLookup="yes", and analyze it for enumerations and ranges. The implementation must then determine the right combination of decision tree on numeric ranges, vs. hash table on enumerated values, to implement the lookup capabilities.

...