Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

They are described here in no particular order, along with some discussion of them.

DFDL v1.0 was designed to standardize the behaviors of many ad-hoc product-specific data format description capabilities.

One set of goals for DFDL v2.0 will be to advance the state-of-the-art, and describe data formats that none of the prior-generation ad-hoc product-specific data tools have been able to address effectively.

Recursion

One of the first things people want to model in DFDL always seems to be a binary legacy document formats like RTF or older MS Word documents. These have recursive structures where a section can contain text and other sections. DFDL v1.0 was not designed with document formats in mind, but rather with more traditional "data sets" or files of data in mind.

One advantage of DFDL without recursion is that it is not a 'turing complete' language. This is helpful from a security perspective. Adding recursion may break this boundary. Of course since DFDL has a rich regular expression capability, and its own backtracking, one can still create schemas that take absurdly long to execute even without recursion, so maybe this is not an issue.

...

which is sometimes needed when %NL; is the terminator and you want to distinguish the separator and terminator. The %WSP+; entity encompasses all whitespace.

Summary Functions/Operations

For both parsing and particularly for unparsing, one often must measure something. Fixed length formats often have tabular layouts in them, and the widths of the columns need to be computed from the longest string in the data.

This is a form of multi-pass (aka Layers), but for the unparsing case, it's really just about computing the length of something from values in the infoset, a capability DFDL already has. The need is to just generalize the calculation capability with some sort of map/reduce on arrays.

Extensions with User-defined Functions

Some computations are too complex to render directly in DFDL expressions. The ability to add functions in an orderly way is necessary.

Examples of this are computing the CRC for a network packet or checksums/hashes for other data structures, or encrypt/decrypt and compress/decompress.

These functions need to be able to examine the Daffodil processor state (Infoset and data streams).

Security Features

No Network Mode: This is less a DFDL language feature than a characteristic desirable for all implementations of DFDL. Applications using DFDL must be able to execute both in an environment which has no access to the internet, and even on machines that do have such access, in a mode where they make no attempt to access anything remotely.

Regular Expression Enhancements

DFDL schemas involve some large and complex regular expressions. Even the most advanced regular expression languages lack convenient ability to define a given construct once and name it, and then reuse it by somehow referencing that name. This would dramatically ease construction of regular expressions, and it is simply basic software engineering that large and complex things need to be named and reused, not duplicated.