Coding Style & Guidelines for Contributors

This page discusses coding style guidelines for the Daffodil code base.

Much of the code does not follow these guidelines. As it evolves the goal is to make new code follow these guidelines, and to evolve existing code toward them.

Scala, not Java

We are committed to using Scala for Daffodil long term. Do not add Java code to this code base except in a few special circumstances.

we use many java-based libraries of course
code snippets from online that are being used largely unmodified can be pasted wholesale into Java files.

If you find online examples of how to use an API from Java, then mostlikely these should be rewritten into Scala. Often there are nicer Scala idioms. Be sure to Web-search for the same API with the keyword "Scala" added to your search. Often you will find idiomatic scala to accomplish the same thing.

Use Scala's built in XML capabilities to reduce the quoting hell that otherwise results when you try to type XML as string content.

We are committed to tracking Scala as it evolves. It is too early to try to freeze the Scala language. There are improvements, particularly in the XML support, which are needed, and which we will want to take advantage of. So expect some disruption when major releases of Scala emerge.

Similarly, we expect to track new versions of the libraries we depend on. Please use a robust naming discipline of naming libraries to make versions clear.

Except perhaps Saxon which is still the no-longer-progressing Saxon-B, which is fine for now.

Test-Driven Development & Design-for-Test (DFT)

Our code is organized under src and srcTest directories, with test-only source code going in the latter directory. The package structure under these is identical, the separation is just so that we can package distributions of daffodil that do not contain test code, should we so desire.

Unit Testing

Everything should have unit tests, though there is always debate of what a "unit" really means. For our purposes, what we mean by unit tests is test that are easily run, by the developer, in the IDE and outside the IDE, which very quickly tell you the status of the code - what's still working, what is broken, and have some intention of helping isolate the problem to smaller units of code.

Unit tests must run quickly, i.e., in just a second or so, though the whole suite of them, if run en-masse, can take 15 to 30 seconds to run.

Larger test suites can also be written using JUnit, so not everything using unit testing tools is strictly speaking a "unit" test.

A couple of specifics:

JUnit predicates, not Scalatest - That is, use assertEquals(expected, actual), not "actual should be equal to expected" (from scalatest's ShouldMatchers classes) because the IDE supports JUnit well, and doesn't support scalatest.
- We do use Scalatest, but mostly for the bridge to JUnit, and the convenient intercept construct for catching expected exceptions.
- Someone needs to make an argument in favor of Scalatest's ShouldMatchers stuff because it seems its biggest attraction is nice English-language readable sentences of test output, and this is not very compelling as an advantage.
JUnit3, because the Unit test support in the IDE is still imperfect, and Junit3 is easiest to write tools for, because the test classes have to use a common test base class, so they're easy to identify.
- In the future we may have to upgrade to JUnit4, because that is what TypeSafe (a Scala company) seems to be supporting.

Test Suites and TDML (Test Definition Markup Language)

DFDL is a large specification. There's no way to be successful implementing it without a very extensive emphasis on test.

IBM has contributed a set of tests they use for their commercial DFDL implementation, which are expressed in a Test-Definition-Markup-Language (TDML).

We have adopted TDML as our standard for expressing tests as well.

TDML enables creation and interchange of very self-contained tests.

IDE vs. Command-line and REPL

Many Scala fans really like the Read-Eval-Print loop paradigm. Many languages starting with LISP, had R-E-P loops as a core development tool. However, the REPL style can be a big disadvantage. REPL-style encourages ad-hoc testing where the tests are run once by the developer in the REPL, and are not captured for repeated use as unit tests. REPL discourages giving real thought to design-for-test and regression testing. The REPL is great for learning how to call something, reminding yourself how a function works, etc. I.e., for trying things out. It is not a good way to do testing of your own code.

An IDE with explicit support for building up a library of unit tests beside the code is really greatly superior.

An important theme is converting the code base so that it is easy to work on and can get the benefits of an IDE.

Coding Style

Avoid Functionals - Do Not Over-Use the apply() method.

FP advocates like to make objects which take action when applied to another object. Sometimes this is a useful style, but more often when an object is going to take some action, the method should be named using the verb.

The apply() idiom breaks down when there are more than a couple of arguments, as the code gets pretty hard to read.

In addition, the IDE provides much better support for a named method with named arguments.

So, eliminate/avoid uses of class derivations from FunctionN (e.g., Function6, Function5, Function4, which have generic 6, 5, and 4-argument apply function signatures) because they have generic argument names. Instead these classes should either

have their own explicit apply functions which have descriptive argument names. These argument types and names are then visible to the IDE.
have verb-named methods

IDE Support and Coding for Debug

Use a coding style supported by the IDE. E.g., notationally, Scala supports both these styles as equivalent:

 object method argument // non-punctuated style
 object.method(argument) // punctuated style

Without the IDE, one might be indifferent, or in some cases prefer the less-punctuated style. With the Eclipse IDE, the latter style is clearly preferable, as when you type that ".", a menu pops up of available methods and members to choose from. This greatly accelerates ones work, and helps immensely when trying to learn a large code base. As I have been editing and debugging the code, I've found myself rewriting in the punctuated style to gain this advantage.

In Scala, the non-punctuated style becomes important if one has constructed a domain-specific language (DSL) and the various program objects are verbs and nouns of that language. But when you are dealing with object and method, the punctuated style is clearer.

(Update: Emacs Ensime mode is a very good Scala IDE, and Ensime is also usable with other text editors. Anyway it does not have this "." notation restriction. It will happily give you suggested completions regardless of your notational preference. However, until this comes to the Scala Eclipse IDE, I still suggest use of "." notation.)

h3. Coding for Debug - Spread Out the Code

Use a coding style motivated by the availability of breakpoint debugging in the IDE. A coding style called "coding for debug" is important here. Breakpoint debuggers are line-oriented, and so it is much easier to navigate code that is spread out so that there is one function/procedure/method call per line. Hence, expressions like:

    f(g(a), h(b))

get rewritten as

    {
    val x = g(a)
    val y = h(b)
    val res = f(x, y)
    res // good place for a breakpoint
    }

another example that comes up a lot in Daffodil is

    processor(a, b, c, d, e) match {
     case A(x) => f(x)
     case B(y) => g(y)
    }

Which gets rewritten as

    val p = processor(a, b, c, d, e)
    p match {
      case A(x) => {
          val r = f(x)
          r
      }
      case B(y) => {
          val r = g(y)
          r
      }
   }

This has many good places to put line-oriented breakpoints where you can observe at a glance what the value of the variables is.

All this reduces code density somewhat, but if the variable names and function anmes are well chosen this can counter-balance by improving the self-documenting nature of the code thereby reducing the number of lines of comments required to make the code clear. This helps especially when dealing with highly polymorphic code, as in Daffodil.

The discipline this coding style supports is very much Test-Driven Development, that is, writing unit tests, and walking through them when they fail by just using the IDE "Debug As JUnit Test" feature, and watching the variables change, because the variables give observability to what is going on.

Uniform Return Type

Suppose you want to write:

 def myFunction(arg : Seq[T]) : Seq[T]

This is some function which takes a sequence, and returns a sequence. Since a sequence is a generic type that is a supertype of lists, nodeSeq, etc. you can pass this many things. What you will get back is of type Seq[[T]] to the caller.

So if you pass a Vector[[Node]] you get back a Seq[[Node]]..... the return type doesn't match the argument type. You can perhaps down-cast it to some concrete type if you want. Turns out a better way to write this is:

 def myBetterFunc[S <: Seq[T]](arg : S) : S

That notation S <: Seq[[T]] can be read as S is a subtype of Seq[[T]]

This function signature says myBetterFunction takes an arg of type S, returns that same type S, oh, and S must be a subtype ofSeq[[T]].

So, when you call myBetterFunc, passing it a Vector[[Node]] you will get back a Vector[[Node]]. This is a general principle of Scala library design called the "uniform return type principle" that makes libraries easier to use, and avoids many error-prone downcasts.

Attach the Source Code

The source tree has a lib directory and a libSrc directory. The libSrc is for the jars/files that contain the source code for libraries we use. The lib directory is for jars that are either all-in-one (source and binary and doc), or just binary code. That is to say, when the source and doc are separate, download them also and put the corresponding artifacts in libSrc.

Having the source code to walk into from the debugger helps immensely with debugging, and makes up for some of the deficiencies of the Scala IDE support versus the more mature Java IDE support. E.g., Scala mode today doesn't pop up Javadoc strings, but if you can quickly jump over to the corresponding piece of source code, you can read the javadoc/scaladoc there.

Specifics on Libraries

Some libraries we use (or don't use) deserve specific commentary.

Library Licenses

We are committed to the Univ. of Illinois/NCSA open-source licensing terms for the Daffodil code. This restricts the licenses of libraries we use to those compatible with this license.

Generally speaking, this means we cannot use libraries licensed under the GPL (v2 or v3), but there are variations of these licenses (e.g., "classpath exception", and LGPL) which may be acceptable. These need to be examined on a case-by-case basis.

Problematic Libraries

There are some "supposedly" standard libraries that we're not using, basically because we tried and they didn't work out. Details on these efforts are below. Some day in the future this may be worth revisiting, but only if either the libraries have improved or we have someone with maintenance-level experience with them join the Daffodil project, that is, someone who knows how to make them work properly.

Apache XML Schema

This library has been tried and is inadequate to our needs currently (2012-02-24). It lacks support for non-native attributes, the support for appinfo and annotations in general is difficult to use (if it works at all), and it has no escape-mechanism by which one can bypass, get back to the XML objects themselves, and overcome its limitations.

XSOM - XML Schema Object Model

This library has been tried and we may still use it to assemble lists of schema files for us, so that it will handle the namespace resolution and include/import. But we have tried and found it unusable as far as abstract access to the DFDL Schema objects. Specifically, it does not have a first-class notion of a Schema Document. DFDL depends heavily on the notion of a Schema Document in that these are the units where lexically-scoped annotations are used. XSOM provides no way to even ask for the annotations on a schema document, so one cannot implement DFDL's lexical scoping feature using XSOM.

XML Catalog

The resolver for XML catalogs, part of apache commons, has as of (2012-02-28) too many issues. A lot of time was spent trying to use this to resolve namespaces and their links to schemas, for use by daffodil, and its TDML test runner. Our needs are too simple to warrant debugging this library on behalf of the community.

Space shortcuts

Child pages