Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Make the DFDL specification successful 
    • effectively this is an economic requirement: we all spend too much time and money on data format stuff. DFDL will help. Sooner the better.
  • Conformance with the standard
    • Build up compactly expressed test cases that are readily exchanged, and which insure Daffodil becomes and remains in conformance.
    • Worth mentioning: the standard has ambiguities and places where clarifications will be necessary, and tests to drive this are also crucial.
    • Interchange of test cases with other DFDL implementors (notably right now, IBM) will be an advantage to all parties.
  • Performance & Memory utilization:
    • enable use of DFDL for applications that require high-performance streaming access to data (both parsing and unparsing).
      • To make that a bit more concrete: 40,000 1Kbyte messages per second on a 12-core commodity computer
      • I presume this is a performance requirement for both reading/parsing them, and for serializing/unparsing the messages.
      • A key requirement here is that you must be able to avoid maintaining the whole data stream in memory, but this may require some restrictions on the generality of the specific format as well. (Some formats just don't stream well.)
    • enable use of DFDL for access to large data file-based structures in memory (DOM-tree style)
      • Daffodil today (2012 - January) is closest to this goal.
    • true random access - i.e., without retrieving/constructing the entire tree.
  • Features - we need to prioritize certain features so as to decide when to byte the bullets needed to implement.
    • Bits - Daffodil is byte-centric today. If dense bit-packed formats are critical (and I suspect they are), then some back-end rework to deal with bits cleanly is required that could otherwise be postponed.
    • Encodings - which are important?
  • Robustness & Code Quality - it's pretty critical that the implementation be robust (not too many bugs) given the diverse constituencies it is expected to serve.
    • maintainability of the code-base as it grows and comes into conformance with the spec is very important. 
    • this is an open source project - there's a coolness factor here - having the code base remain cleanly organized and crafted is key to attracting talented people over time to keep it moving forward. This is one of the reasons why creating Daffodil in Scala is a good idea - a very cool new language attracts talented individuals, etc. It's good to be at least somewhat cutting edge here.
  • Timeliness - the usual caveats apply here, i.e., feature coverage with what level of performance? The above performance/memory requirements need to be phased over time. Nevertheless, desirable time lines that have been mentioned by some but without discussion of coincident performance can be summarized as:
    • 60/70% feature coverage by end Sept 2012
    • 100% feature coverage for parsing by early spring 2013
    • Unparsing with 100% feature coverage by later 2013

...