Coding Style & Guidelines for Contributors

This page discusses coding style guidelines for the Daffodil code base.

Much of the code does not follow these guidelines. As it evolves the goal is to make new code follow these guidelines, and to evolve existing code toward them.

Scala, not Java

We are committed to using Scala for Daffodil long term. Do not add Java code to this code base except in a few special circumstances.

we use many java-based libraries of course
code snippets from online that are being used largely unmodified can be pasted wholesale into Java files.

If you find online examples of how to use an API from Java, then mostlikely these should be rewritten into Scala. Often there are nicer Scala idioms. Be sure to Web-search for the same API with the keyword "Scala" added to your search. Often you will find idiomatic scala to accomplish the same thing.

Use Scala's built in XML capabilities to reduce the quoting hell that otherwise results when you try to type XML as string content.

We are committed to tracking Scala as it evolves. It is too early to try to freeze the Scala language. There are improvements, particularly in the XML support, which are needed, and which we will want to take advantage of. So expect some disruption when major releases of Scala emerge.

Similarly, we expect to track new versions of the libraries we depend on. Please use a robust naming discipline of naming libraries to make versions clear.

Except perhaps Saxon which is still the no-longer-progressing Saxon-B, which is fine for now.

Test-Driven Development & Design-for-Test (DFT)

Our code is organized under src and srcTest directories, with test-only source code going in the latter directory. The package structure under these is identical, the separation is just so that we can package distributions of daffodil that do not contain test code, should we so desire.

Unit Testing

Everything should have unit tests, though there is always debate of what a "unit" really means. For our purposes, what we mean by unit tests is test that are easily run, by the developer, in the IDE and outside the IDE, which very quickly tell you the status of the code - what's still working, what is broken, and have some intention of helping isolate the problem to smaller units of code.

Unit tests must run quickly, i.e., in just a second or so, though the whole suite of them, if run en-masse, can take 15 to 30 seconds to run.

Larger test suites can also be written using JUnit, so not everything using unit testing tools is strictly speaking a "unit" test.

A couple of specifics:

JUnit predicates, not Scalatest - because the IDE supports JUnit well, and doesn't support scalatest.
- Someone needs to make an argument in favor of Scalatest because it seems its biggest attraction is nice English-language readable sentences of test output, and this is not very compelling as an advantage.
JUnit3, because the Unit test support in the IDE is still imperfect, and Junit3 is easiest to write tools for, because the test classes have to use a common test base class, so they're easy to identify.
- In the future we may have to upgrade to JUnit4, because that is what TypeSafe (a Scala company) seems to be supporting.

Test Suites and TDML (Test Definition Markup Language)

DFDL is a large specification. There's no way to be successful implementing it without a very extensive emphasis on test.

IBM has contributed a set of tests they use for their commercial DFDL implementation, which are expressed in a Test-Definition-Markup-Language (TDML).

We have adopted TDML as our standard for expressing tests as well.

TDML

IDE vs. Command-line and REPL

Many Scala fans really like the Read-Eval-Print loop paradigm. Many languages starting with LISP, had R-E-P loops as a core development tool. However, the REPL style can be a big disadvantage. REPL-style encourages ad-hoc testing where the tests are run once by the developer in the RE. Also added libsrc directory to put doc and src of the libraries. This is very useful in Eclipse as it will let you navigate and even debug right into the libraries that way. * Changed codebase to make it Eclipse-IDE-Oriented ** source code attachments for all/most libraries so you can debug into them. ** JUnit as test platform, not Scalatest ** Coding for debug style - rewrote code as I needed to in this style to increase observability and debuggability ** All the tests are now runnable as Junit tests. There is no separate test suite that must be run outside the IDE. * Standardized test style: ** namespace URIs do not end with a "/" anywhere. (simplifies comparison of test results to expected XML) ** Result elements are less verbose in that they do not have explicit xsi:types. E.g., a float element value 5.0 looks like <x>5.0</x> not <x xsi:type="xsd:float">5.0</x> (This is not changed everywhere yet, but in many places. It was causing many tests to fail, so I just went with the flow here and removed the xsi:types) ** PL, and are not captured for repeated use as unit tests. REPL discourages giving real thought to design-for-test and regression testing.

An IDE with explicit support for building up a library of unit tests beside the code is really greatly superior.

An important theme is converting the code base so that it is easy to work on and can get the benefits of an IDE.

Coding Style

Avoid Functionals - Do Not Over-Use the apply() method.

FP advocates like to make objects which take action when applied to another object. Sometimes this is a useful style, but more often when an object is going to take some action, the method should be named using the verb.

The apply() idiom breaks down when there are more than a couple of arguments, as the code gets pretty hard to read.

In addition, the IDE provides much better support for a named method with named arguments.

So, eliminate/avoid uses of class derivations from FunctionN (e.g., Function6, Function5, Function4, which have generic 6, 5, and 4-argument apply function signatures) because they have generic argument names. Instead these classes should either

have their own explicit apply functions which have descriptive argument names. These argument types and names are then visible to the IDE.
have verb-named methods

IDE Support and Coding for Debug

Use a coding style supported by the IDE. E.g., notationally, Scala supports both these styles as equivalent:

 object method argument // non-punctuated style
 object.method(argument) // punctuated style

Without the IDE, one might be indifferent, or in some cases prefer the less-punctuated style. With the Eclipse IDE, the latter style is clearly preferable, as when you type that ".", a menu pops up of available methods and members to choose from. This greatly accelerates ones work, and helps immensely when trying to learn a large code base. As I have been editing and debugging the code, I've found myself rewriting in the punctuated style to gain this advantage.

In Scala, the non-punctuated style becomes important if one has constructed a domain-specific language (DSL) and the various program objects are verbs and nouns of that language. But when you are dealing with object and method, the punctuated style is clearer.

Coding for Debug - Spread Out the Code

Use a coding style motivated by the availability of breakpoint debugging in the IDE. A coding style called "coding for debug" is important here. Breakpoint debuggers are line-oriented, and so it is much easier to navigate code that is spread out so that there is one function/procedure/method call per line. Hence, expressions like:

    f(g(a), h(b))

get rewritten as

    {
    val x = g(a)
    val y = h(b)
    val res = f(x, y)
    res // good place for a breakpoint
    }

another example that comes up a lot in Daffodil is

    processor(a, b, c, d, e) match {
     case A(x) => f(x)
     case B(y) => g(y)
    }

Which gets rewritten as

    val p = processor(a, b, c, d, e)
    p match {
      case A(x) => {
          val r = f(x)
          r
      }
      case B(y) => {
          val r = g(y)
          r
      }
   }

This has many good places to put line-oriented breakpoints where you can observe at a glance what the value of the variables is.

All this reduces code density somewhat, but if the variable names and function anmes are well chosen this can counter-balance by improving the self-documenting nature of the code thereby reducing the number of lines of comments required to make the code clear. This helps especially when dealing with highly polymorphic code, as in Daffodil.

The discipline this coding style supports is very much Test-Driven Development, that is, writing unit tests, and walking through them when they fail by just using the IDE "Debug As JUnit Test" feature, and watching the variables change, because the variables give observability to what is going on.

Attach the Source Code

The source tree has a lib directory and a libSrc directory. The libSrc is for the jars/files that contain the source code for libraries we use. The lib directory is for jars that are either all-in-one (source and binary and doc), or just binary code. That is to say, when the source and doc are separate, download them also and put the corresponding artifacts in libSrc.

Having the source code to walk into from the debugger helps immensely with debugging, and makes up for some of the deficiencies of the Scala IDE support versus the more mature Java IDE support. E.g., Scala mode today doesn't pop up Javadoc strings, but if you can quickly jump over to the corresponding piece of source code, you can read the javadoc/scaladoc there.

Specifics on Libraries

Some libraries we use (or don't use) deserve specific commentary.

Apache XML Schema

This library has been tried and is inadequate to our needs currently (2012-02-24). It lacks support for non-native attributes, the support for appinfo and annotations in general is difficult to use (if it works at all), and it has no escape-mechanism by which one can bypass, get back to the XML objects themselves, and overcome its limitations.

XSOM - XML Schema Object Model

This library has been tried and we may still use it to assemble lists of schema files for us, so that it will handle the namespace resolution and include/import. But we have tried and found it unusable as far as abstract access to the DFDL Schema objects. Specifically, it does not have a first-class notion of a Schema Document. DFDL depends heavily on the notion of a Schema Document in that these are the units where lexically-scoped annotations are used. XSOM provides no way to even ask for the annotations on a schema document, so one cannot implement DFDL's lexical scoping feature using XSOM.

Space shortcuts

Child pages