Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1598

Unparser: For strings that truncate, the dfdl:valueLength function cannot suspend

XMLWordPrintableJSON

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • deferred
    • None
    • Back End, General
    • None

      When unparsing, the strategy used to determine the target length for an element is to determine the value length by allowing the unparsing to go forward but into a buffering data output stream. The value length is determined by capturing the starting position, the ending position (both of which are in the buffered output), and subtracting.

      However, if the string is a truncated string (lengthKind 'explicit' or 'implicit', with dfdl:truncateSpecifiedLengthString 'yes'), we have a cycle because the value-length of an element is the post-truncation length, yet to determine the target length we will often need to know the value-length.

      For example:

      <xs:element name="len" type="xs:int" dfdl:outputValueCalc="{ 
        if (dfdl:valueLength(../data) lt 100) then 100 else dfdl:valueLength(../data) 
      }" />
      <xs:element name="data" type="xs:string" dfdl:lengthKind='explicit' dfdl:length='{ ../len }'
         dfdl:truncateSpecifiedLengthString='yes'/>

      In the above, the length expression depends on the 'len' element value. The len element value requires the valueLength of the 'data' element. Without truncation, this would work, as we could unparse the value of the 'data' into a buffering data output stream, and measure its length, and that would unblock the suspended 'len' element's dfdl:outputValueCalc expression, which would allow the 'data' element's length expression to be evaluated, and we would then know how much padding/fill to add to the 'data' element representation.

      But if the 'data' string can be truncated, as in the example above, then this fails, because we can't unparse it and allow the value length to be derived from the unparsed representation in a buffer, since the valueLength is supposed to be the post-truncation value.

      So a cyclic deadlock will occur unparsing things like the above.

      The question is: is this a problem? The above example could be re-written as:

      <xs:element name="len" type="xs:int" dfdl:outputValueCalc="{ 
        if (fn:string-length(../data) lt 100) then 100 else fn:string-length(../data) 
      }" />
      <xs:element name="data" type="xs:string" dfdl:lengthKind='explicit' dfdl:length='{ ../len }'
         dfdl:truncateSpecifiedLengthString='yes'/>

      The fn:string-length function provides the pre-truncation length of the element. This elminates the cycle.

      We can detect this error at runtime, as the ElementRuntimeData structure contains optTruncateSpecifiedLengthString, which can be examined at runtime by the valueLength function, which can error that dfdl:valueLength is being called on a truncated string, and the diagnostic can suggest that fn:string-length is preferable.

      However, it's not an error to call dfdl:valueLength on a string that may be truncated. It's only an error to do so in a way that creates this deadlock.

      The DFDL spec does not preclude calling dfdl:valueLength on a string element that might be truncated, and the spec is clear that this would be the post-truncation value-region length.

      So we need a mechanism where we can produce a runtime SDE, not any time dfdl:valueLength is called on a string that might be truncated, but a mechanism where we examine the deadlocked cycle, and we see if it is caused by taking dfdl:valueLength of a string that might be truncated, so we can issue the runtime SDE about this particular cyclic definition problem only in the cases where it is actually creating a cycle.

      That would be ideal, but an interim acceptable solution might be:
      (a) have tutorials about cycles and include this example
      (b) disallow this at schema compile time as something daffodil just doesn't allow - along with the fix which is to use fn:string-length instead.
      or both.

              Unassigned Unassigned
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: