Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1741

Performance - validation of Enum and Ranges for simple type unions


    • Icon: Improvement Improvement
    • Resolution: Unresolved
    • Icon: Major Major
    • deferred
    • 2.0.0
    • Back End, Middle "End"
    • None

      DFDL-1634 provides support for simpleType uinons in Daffodil.

      However, a union is still validated (in "limited" validation mode, or when dfdl:checkConstraints is called) by sequentially walking through the union members one by one.

      When the union consists of enumerations, or ranges, or mixtures of only those two, then a faster mechanism is needed.

      E.g., for a union that is just enumerations, the validation should take constant time, by using a hash/table lookup.

      For a union of min/max ranges, some sort of decision tree that rapidly determines validity is required.

      combinations of these are also possible. A common situation would be for the integer 0 to mean "No Statement", 1-25 are valid values that have some mathematical meaning, and 26-31 are illegal.

      We want to express this like

         <union memberTypes="ex:noStatement ex:allowed ex:illegal"/>
      <simpleType name="noStatement">
        <restriction base="xs:int">
           <enumeration value="0"/>
      <simpleType name="valid">
         <union memberTypes="ex:car ex:train ex:plane.... ex:bicycle"/
      <simpleType name="car"><restriction base="xs:int"><enumeration value="1"/></restriction></simpleType>
      <simpleType name="train"><restriction base="xs:int"><enumeration value="2"/></restriction></simpleType>
      <simpleType name="plane"><restriction base="xs:int"><enumeration value="3"/></restriction></simpleType>
      ... 21 more ...
      <simpleType name="bicycle"><restriction base="xs:int"><enumeration value="25"/></restriction></simpleType>
      <simpleType name="illegal">
         <restriction base="xs:int">
             <minInclusive value="26"/>
            <maxInclusive value="31"/>

      In the above, the union containing the 25 enumeration simple types will be iterated with a loop that will go through all 25 possibles one by one. This needs to be improved to be a constant time dispatch.

              efinnegan Elizabeth Finnegan
              mbeckerle.dfdl Mike Beckerle
              0 Vote for this issue
              2 Start watching this issue