A few weeks ago I joined the open-source project working on Daffodil.
I checked out the source, and have started working on it.
I'm seeking guidance from those who have familiarity with it, to confirm or clarify some of the things I believe I have discovered about it. This is a bit of a ramble also discussing some of the changes I have been making.
My changes have been committed to the trunk. A tag named r9-2011-11-28-mbeckerle.dfdl was created off of revision 9, which is the version I started from. This tag exists to basically record that r9 was where I started, and to make archeology of "where's that missing file" easier.
It is working pretty much "as is", my changes are oriented towards using the new Eclipse Scala IDE support to improve productivity working on it, and to enable subsequent evolution toward conformance with the latest DFDL v1.0 specification, which is very close to final form.
As I encountered the source code, it is a body of software that was, as of my checkout, not at a stable design point:
- XML Namespace support - this is partly done. Code looks like it was built without this, and it was added in later. I've enhanced this a little, but I'm sure there are limitations. In particular the xsd namespace prefix is hardcoded a few places.
- DFDL adherence - it is of course evolving to try become compliant with the DFDL standard which has changed since the initial Daffodil code was written. This is work in mid-stream.
- IDE support - clearly this code base was written without the support of the Eclipse IDE, which I would agree, until now, has been more trouble than it was worth for Scala. It changes the code when you can depend on an IDE because it makes some things unnecessary (some kinds of outside test and debug rigs), and it biases coding style toward what the IDE best supports. I find that the Scala Eclipse IDE is now adequate, more on this below.
- Scala idioms - Scala is a new language, most people using it are doing so for the first time, and trying to come up with the right balance of object-oriented idioms and functional-programming (FP) idioms. On balance this code does a pretty good job at this. Daffodil has some stuff that I would say is too FP oriented, such as too much dependence on the apply() operation and functionals.
- Test cases: some were written without the advantages of some tools in XMLUtil, and so are clunky. I've revised most of these to make them more compact and less fragile. They take advantage of Scala's XML syntax support now (which perhaps is possible because of the update to the latest Scala revision?).
First, I love Scala compared to Java, and it's worth some pain and learning curve to use it. That said, a big set of changes I've made to the code have to do with the Scala Eclipse IDE support having now improved to the point where it is worth using; hence, some changes:
- Converted Unit tests from Scalatest to JUnit - because the IDE supports JUnit well, and ...frankly I see little to no value to Scalatest particularly.
- I used JUnit3, because the Unit test support in the IDE is still imperfect, and Junit3 is easiest to write tools for, because the test classes have to use a common test base class, so they're easy to identify.
- Removed some over-use of functionals - this idiom breaks down when there are more than a couple of arguments, as the code gets pretty hard to read. In addition, the IDE provides much better support for a named method with named arguments. One of the first things I did to the code is to eliminate class derivations from Function6 and Function5 and Function4, which have generic 6, 5, and 4-argument apply function signatures with generic argument names. Instead these classes have their own explicit apply functions which have descriptive argument names. These argument types and names are then visible to the IDE.
I have also started using coding style supported by the IDE. E.g., notationally, Scala supports both these styles as equivalent:
Without the IDE, one might be indifferent, or in some cases prefer the less-punctuated style. With the Eclipse IDE, the latter style is clearly preferable, as when you type that ".", a menu pops up of available methods and members to choose from. This greatly accelerates ones work, and helps immensely when trying to learn a large code base. As I have been editing and debugging the code, I've found myself rewriting in the punctuated style to gain this advantage.
My experience of Scala, the non-punctuated style becomes important if one has constructed a domain-specific language (DSL) and the various program objects are verbs and nouns of that language. But when you are dealing with object and method, the punctuated style is clearer.
A bunch of changes I've made have been motivated by the availability of breakpoint debugging in the IDE. A coding style I call "coding for debug" is important here. Breakpoint debuggers are line-oriented, and so it is much easier to navigate code that is spread out so that there is one function/procedure/method call per line. Hence, expressions like:
get rewritten as
another example that comes up a lot in Daffodil is
Which gets rewritten as
This has many good places to put line-oriented breakpoints where you can observe at a glance what the value of the variables is.
All this reduces code density somewhat, but if the variable names and function anmes are well chosen it can improve the self-documenting nature of the code. This helps especially when dealing with highly polymorphic code, as in Daffodil.
The discipline this coding style supports is very much Test-Driven Development, that is, writing unit tests, and walking through them when they fail by just using the IDE "Debug As JUnit Test" feature, and watching the variables change, because the variables give observability to what is going on.
Many Scala fans really like the Read-Eval-Print loop paradigm. I used REPL-style many years back with Lisp and wrote some quite large programs that way. I totally miss Lisp, though Scala has immense advantages over it in type-safety which really does help, but the REPL style was not the big advantage in Lisp. It was actually a disadvantage. I find that REPL-style encourages ad-hoc testing where the tests are run once by the developer in the REPL, and are not captured for repeated use as unit tests. REPL discourages giving real thought to design-for-test and regression testing. An IDE with explicit support for building up a library of unit tests beside the code is really greatly superior.
Daffodil until now, has had no robust IDE (honestly, the Scala Eclipse plug-in is just now adequate, and still has many flaky aspects - you still have to exit and restart it several times in a day of work, you need a very fast computer because recompilation is quite slow and painful otherwise, etc.), but we're now enabled to take advantage of it.
So a major theme of what I've been doing so far is converting the code base so that it is easy to work on and can get the benefits of the IDE. For the time being I've let some of the command-line features lie fallow. I've not run them, they're probably broken.
- Updated to Scala 2.9.2 - the latest Scala as of November, 2011. (Typesafe stack)
- Updated libraries to more recent versions (except Saxon which is still Saxon-B, which is fine for now.), and change discipline of naming libraries to make versions clearer. Also added libsrc directory to put doc and src of the libraries. This is very useful in Eclipse as it will let you navigate and even debug right into the libraries that way.
- Changed codebase to make it Eclipse-IDE-Oriented
- source code attachments for all/most libraries so you can debug into them.
- JUnit as test platform, not Scalatest
- Coding for debug style - rewrote code as I needed to in this style to increase observability and debuggability
- All the tests are now runnable as Junit tests. There is no separate test suite that must be run outside the IDE.
- Standardized test style:
- namespace URIs do not end with a "/" anywhere. (simplifies comparison of test results to expected XML)
- Result elements are less verbose in that they do not have explicit xsi:types. E.g., a float element value 5.0 looks like <x>5.0</x> not <x xsi:type="xsd:float">5.0</x> (This is not changed everywhere yet, but in many places. It was causing many tests to fail, so I just went with the flow here and removed the xsi:types)
- Use Scala's built in XML capabilities to reduce the quoting hell that otherwise results when you try to type XML as string content.