Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1516

dfdl:contentLength & dfdl:valueLength specifying lengthUnits 'characters' and variable-width encodings

XMLWordPrintableJSON

    • Icon: New Feature New Feature
    • Resolution: Unresolved
    • Icon: Normal Normal
    • deferred
    • None
    • DFDL Language
    • None

      Note that there is DFDL workgroup discussion about the implications of asking for length measured in units of 'characters' when the underlying item is not text, or not all text (complex types).

      There is no issue when the character set encoding is fixed width. One simple takes the data size in bytes/bits and does the math to convert to characters.

      The problem is when there is a variable-width encoding like UTF-8. Measuring length in characters in essence requires unparsing the data into those characters and counting how many, or perhaps unparsing the data to bits/bytes and then parsing it as characters and counting how many.

      In either case, unless there is a uniform character encoding the behavior is confusing. Other places in DFDL where data that is not necessarily text may get interpreted as text are in lengthKind 'pattern', and in the pattern asserts and pattern discriminators used in parsing.

              Unassigned Unassigned
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: