You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

This page about the requirements driving the continuing work on the Daffodil open-source project.

With out repeating the motivations for DFDL from the DFDL spec, this page is intended to capture goals and requirements from the various constituencies interested in Daffodil.

Right now, this is  an UNORDERED list. An important revision will be to prioritize this list, and then to self-organize the interested contributors.

  • Make the DFDL specification successful 
    • effectively this is an economic requirement: we all spend too much time and money on data format stuff. DFDL will help. Sooner the better.
  • Conformance with the standard
    • Build up compactly expressed test cases that are readily exchanged, and which insure Daffodil becomes and remains in conformance.
    • Worth mentioning: the standard has ambiguities and places where clarifications will be necessary, and tests to drive this are also crucial.
    • Interchange of test cases with other DFDL implementors (notably right now, IBM) will be an advantage to all parties.
  • Performance & Memory utilization:
    • enable use of DFDL for applications that require high-performance streaming access to data (both parsing and unparsing).
      • To make that a bit more concrete: 40,000 1Kbyte messages per second on a 12-core commodity computer
      • I presume this is a performance requirement for both reading/parsing them, and for serializing/unparsing the messages.
      • A key requirement here is that you must be able to avoid maintaining the whole data stream in memory, but this may require some restrictions on the generality of the specific format as well. (Some formats just don't stream well.)
    • enable use of DFDL for access to large data file-based structures in memory (DOM-tree style)
      • Initially, this is what Daffodil will target.
    • true random access - i.e., without retrieving/constructing the entire tree.
      • Someday
  • Robustness & Code Quality - it's pretty critical that the implementation be robust (not too many bugs) given the diverse constituencies it is expected to serve.
    • maintainability of the code-base as it grows and comes into conformance with the spec is very important. 
    • this is an open source project - there's a coolness factor here - having the code base remain cleanly organized and crafted is key to attracting talented people over time to keep it moving forward. This is one of the reasons why creating Daffodil in Scala is a good idea - a very cool new language attracts talented individuals, etc. It's good to be at least somewhat cutting edge here.
  • Timeliness - the usual caveats apply here, i.e., feature coverage with what level of performance? The above performance/memory requirements need to be phased over time. Nevertheless, desirable time lines that have been mentioned by some but without discussion of coincident performance can be summarized as:
    • 60/70% feature coverage by end Sept 2012
    • 100% feature coverage for parsing by early spring 2013
    • Unparsing with 100% feature coverage by later 2013

  • No labels