A Progress Report from a New Contributor

A few weeks ago I joined the open-source project working on Daffodil.

I checked out the source, and have started working on it.

I'm seeking guidance from those who have familiarity with it, to confirm or clarify some of the things I believe I have discovered about it. This is a bit of a ramble also discussing some of the changes I have been making.

These are not yet checked in, but could be at any point, that's one of the things I'm seeking guidance on.

Here's Status:

As I encountered the source code, it is a body of software that was, as of my checkout, not at a stable design point:

XML Namespace support - this is partly done. Code looks like it was built without this, and it was added in later. I've enhanced this a little, but I'm sure there are limitations. In particular the xsd namespace prefix is hardcoded a few places.
DFDL adherence - it is of course evolving to try become compliant with the DFDL standard which has changed since the initial Daffodil code was written. This is work in mid-stream.
IDE support - clearly this code base was written without the support of the Eclipse IDE, which I would agree, until now, has been more trouble than it was worth for Scala. It changes the code when you can depend on an IDE because it makes some things unnecessary (some kinds of debug rigs), and it biases coding style toward what the IDE best supports. But the IDE is now adequate, more on this below.
Scala idioms - Scala is a new language, most people using it are doing so for the first time, and trying to come up with the right balance of object-oriented idioms and functional-programming idioms. On balance this code does a pretty good job at this. Daffodil has some stuff that I would say is too FP oriented, such as too much dependence on the apply() operation and functionals.
Test cases: some were written without the advantages of some tools in XMLUtil, and so are clunky. I've revised most of these to make them more compact and less fragile. They take advantage of Scala's XML syntax support now (which perhaps is possible because of the update to the latest Scala revision?).

Some comments on Scala and the structure of the software.

First, I love Scala compared to Java, and it's worth some pain and learning curve to use it. That said, a big set of changes I've made to the code have to do with the Scala Eclipse IDE support having now improved to the point where it is worth using; hence, some changes:

Converted Unit tests from Scalatest to JUnit - because the IDE supports JUnit well, and ...frankly I see little to no value to Scalatest particularly.
I used JUnit3, because the Unit test support in the IDE is still imperfect, and Junit3 is easiest to write tools for, because the test classes have to use a common test base class, so they're easy to identify.
Removed over-use of functionals - this idiom breaks down when there are more than a couple of arguments, as the code gets pretty hard to read. In addition, the IDE provides much better support for a named method with named arguments. One of the first things I did to the code is to eliminate class derivations from Function6 and Function5 and Function4, which have generic 6, 5, and 4-argument apply function signatures with generic argument names. Instead these classes have their own explicit apply functions which have descriptive argument names. These argument types and names are hten visible to the IDE.
Design for Test, and Test-driven Development

Coding for Debug

A bunch of changes I've made have been motivated by the availability of breakpoint debugging in the IDE. A coding style I call "coding for debug" is important here. Breakpoint debuggers are line-oriented, and so it is much easier to navigate code that is spread out so that there is one function/procedure/method call per line. Hence, expressions like:

f(g(a), h(b))

get rewritten as

{{
val x = g(a)
val y = h(b)
val res = f(x, y)
res // good place for a breakpoint
}}

another example that comes up a lot in Daffodil is

processor(a, b, c, d, e) match {{
case A => f
case B => g
}}

Which gets rewritten as

val p = processor(a, b, c, d, e)
p match {
case A =>

Unknown macro: { val r = f(x) r }

case B =>

Unknown macro: { val r = g(y) r }

}

This has many good places to put line-oriented breakpoints where you can observe at a glance what the value of the variables is.

All this reduces code density somewhat, but if the variable names and function anmes are well chosen it can improve the self-documenting nature of the code. This helps especially when dealing with highly polymorphic code, as in Daffodil.

The discipline this coding style supports is Test-Driven Development, that is, writing unit tests, and walking through them when they fail by just using the IDE "Debug As JUnit Test" feature, and watching the variables change, because the variables give observability to what is going on.

IDE vs. Command-line and REPL

Many Scala fans really like the Read-Eval-Print loop paradigm. I used REPL-style many years back with Lisp and wrote some quite large programs that way. I totally miss Lisp, though Scala has immense advantages over it in type-safety which really does help, but the REPL style was not the big advantage in Lisp. It was actually a disadvantage. I find that REPL-style encourages ad-hoc testing where the tests are run once by the developer in the REPL, and are not captured for repeated use as unit tests. REPL discourages giving real thought to design-for-test and regression testing. An IDE with explicit support for building up a library of unit tests beside the code is really greatly superior.

Daffodil until now, has had no robust IDE (honestly, the Scala Eclipse plug-in is just now adequate, and still has many flaky aspects - you still have to exit and restart it several times in a day of work, you need a very fast computer because recompilation is quite slow and painful otherwise, etc.), but we're now enabled to take advantage of it.

Some Specific Changes

Updated to Scala 2.9.2 - the latest Scala as of November, 2011. (Typesafe stack)
Changed codebase to make it Eclipse-IDE-Oriented
- source code attachments for all/most libraries so you can debug into them.
- JUnit as test platform, not Scalatest
- Coding for debug style - rewrote code as I needed to in this style to increase observability and debuggability
- All the tests are now runnable as Junit tests. There is no separate test suite that must be run outside the IDE.
Standardized test style:
- namespace URIs do not end with a "/" anywhere.
- Result elements are less verbose in that they do not have explicit xsi:types. E.g., a float element value 5.0 looks like <x>5.0</x> not <x xsi:type="xsd:float">5.0</x> (This is not changed everywhere yet, but in many places. It was causing many tests to fail, so I just went with the flow here and removed the xsi:types)

Space shortcuts

Child pages