Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Some parser tests are invertible. Having parsed data to an Infoset, one can unparse back to data and for some DFDL schemas, get the identical data. This doesn't work for all DFDL schemas - escape schemes can parse things with say, surrounding quotes which on unparsing are determined to be unnecessary and so are not output. Also multiple values are allowed for delimiters, but the first of these values is used on output, so incoming data that uses one of the other delimiters (not the first) won't unparse to the same delimiters. That said, many tests will be invertible.  A flag on TDML parser tests should indicate whether the test can be inverted. Some way to bulk-set this flag so it doesn't have to be done explicitly.
  • Note that unparser tests are much more likely to be invertible. It is possible to create a schema that is asymmetric - what it writes out isn't the same format that it reads in. But this is an atypical corner case rather than a common thing. An example of this is Nil ambiguity with "empty" values. Nil might be represented by empty string which might not have quotes, so  in comma delimited data several adjacent nil fields may be ",,,," but that might parse as several empty values which take a default value such as 0.

TBD

  • Streaming output for large objects - this is symmetric with a parser feature we need, which is the ability of the unparser to accept a large object not as a giant string or hexBinary blob, but as a file descriptor or other specification that can be opened and pulled separately from the Infoset elements.
  • Truncated output when length units is bytes and encoding is variable width (e.g., utf-8). The issue is truncating that chops the code units of a character off part way through.
  • Improvements in coding style: smaller Scala code files, smaller TDML files - for parsing there are some giant files and some TDML files that have hundreds of tests in them. We ought not repeat these mistakes.

...