Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

See also Performance Coding for Runtime

64-bit vs. 32-bit

Our goal is all-64-bit capabilities. Unfortunately, many Java and Scala libraries do not allow offsets or positions larger than a signed 32-bit Int can hold.

...

Code Block
languagescala
def myFunc(param: Long) = {
   Assert.usage(param <= Int.MaxValue, "Maximum is 32-bit limited currently. %s".format(param))
   val intParam = param.toInt
   ...
   callLameAPI(intParam)
   ...
}

Scala, not Java

We are committed to using Scala for Daffodil long term. Do not add Java code to this code base except in a few special circumstances.

...

* Except perhaps Saxon which is still the no-longer-progressing Saxon-B, which is fine for now. (Note: No longer using Saxon as of late 2014.)

Use Smaller Files

Scala's compiler is quite slow, and an this must be taken into account to insure a reasonable edit-compile-debug cycle for developers. A compilation unit is an entire file. Incremental compilation is improved in efficiency if the files are smaller. So avoid huge files that blend multiple concepts together. Do not, however, go so far as to break things apart that really are best understood if kept in the same file.

Test-Driven Development & Design-for-Test (DFT)

Our code is organized under src/main and src/test directories, with test-only source code going in the latter directory. The package structure under these is identical, the separation is just so that we can package distributions of daffodil that do not contain test code, should we so desire.

Unit Testing

Everything should have unit tests, though there is always debate of what a "unit" really means. For our purposes, what we mean by unit tests is test that are easily run, by the developer, in the IDE and outside the IDE, which very quickly tell you the status of the code - what's still working, what is broken, and have some intention of helping isolate the problem to smaller units of code.

...

  • JUnit predicates, not Scalatest - That is, use assertEquals(expected, actual), not "actual should be equal to expected" (from scalatest's ShouldMatchers classes) because the IDE supports JUnit well, and doesn't support scalatest.
    • We do use Scalatest, but mostly for the bridge to JUnit, and the convenient intercept construct for catching expected exceptions.
    • Someone needs to make an argument in favor of Scalatest's ShouldMatchers stuff because it seems its biggest attraction is nice English-language readable sentences of test output, and this is not very compelling as an advantage.
  • JUnit4, because that is what TypeSafe (a Scala company) seems to be supporting.

Test Suites and TDML (Test Definition Markup Language)

DFDL is a large specification. There's no way to be successful implementing it without a very extensive emphasis on test.

...

TDML enables creation and interchange of very self-contained tests.  

IDE vs. Command-line and REPL

Many Scala fans really like the Read-Eval-Print loop paradigm. Many languages starting with LISP, had R-E-P loops as a core development tool. However, the REPL style can be a big disadvantage. REPL-style encourages ad-hoc testing where the tests are run once by the developer in the REPL, and are not captured for repeated use as unit tests. REPL discourages giving real thought to design-for-test and regression testing. The REPL is great for learning how to call something, reminding yourself how a function works, etc. I.e., for trying things out. It is *not* a good way to do testing of your own code.

...

An important theme is converting the code base so that it is easy to work on and can get the benefits of an IDE.

Coding Style

Unless specified below, all code should following the Scala Style Guide.

Bits, Bytes, 1-based, and zero-based indexing

DFDL and XML use 1-based indexing. Java, Scala, and all their libraries (except XML libraries?) are zero-based.

...

  • bitPosition0b : ULong - means position, measured in bits, first bit is at position 0, type unsigned long.
  • mCharWidthInBits: MaybeInt - measured in bits, but note that sizes, lengths, widths, don't have 0 or 1 base stuff. Note also use of MaybeInt type.
  • childIndex1b - child index, first child is at index 1.

Exercise for Reader!

Create a scala ZeroBased and OneBased AnyVal wrapper type with explicit (or some implicit) conversions.

The point is to let the scala compiler give you an error when you mix zero and one-based things, or pass a zero-based thing to an argument that wants a 1 based thing.

So the type of bitPosition0b (which is ULong currently) would be 

Code Block
var bitPosition0b = ZeroBased[ULong]

var bitPosInByte1b = OneBased[UInt]

For examples on how to do number types along these lines, look at UInt which is an AnyVal type.

Identifier Naming Conventions

Choose identifiers for positions, lengths, and limits wisely. Here are some conventions to follow:

...

length limit = a length that bounds the maximum length

size = same as length.

Line Endings

All files should use Unix line endings (i.e. \n).

Avoid Functionals - Do Not Over-Use the apply() method.

FP advocates like to make objects which take action when applied to another object. Sometimes this is a useful style, but more often when an object is going to take some action, the method should be named using the verb.

...

  • have their own explicit apply functions which have descriptive argument names. These argument types and names are then visible to the IDE.
  • have verb-named methods

Careful with the Catches

You should always limit the scope of try/catch blocks to the smallest region of code that needs to be in the scope of the try.

...

This insures you are not accidentaly suppressing things like Assert.invariantFailed() or Assert.notYetImplemented().

IDE Support and Coding for Debug

Use a coding style supported by the IDE. E.g., notationally, Scala supports both these styles as equivalent:

...

(Update: Emacs Ensime mode is a very good Scala IDE, and Ensime is also usable with other text editors. Anyway it does not have this "." notation restriction. It will happily give you suggested completions regardless of your notational preference. However, until this comes to the Scala Eclipse IDE, I still suggest use of "." notation.)

Coding for Debug - Spread Out the Code

Use a coding style motivated by the availability of breakpoint debugging in the IDE. A coding style called "coding for debug" is important here. Breakpoint debuggers are line-oriented, and so it is much easier to navigate code that is spread out so that there is one function/procedure/method call per line. Hence, expressions like:

...

The discipline this coding style supports is very much Test-Driven Development, that is, writing unit tests, and walking through them when they fail by just using the IDE "Debug As JUnit Test" feature, and watching the variables change, because the variables give observability to what is going on.

Uniform Return Type Principle

Suppose you want to write:

...

So, when you call myBetterFunc, passing it a Vector\[Node] you will get back a Vector\[Node]. This is a general principle of Scala library design called the "uniform return type principle" that makes libraries easier to use, and avoids many error-prone downcasts.

Use 'def' for abstract members

When you create an interface in a base class for a derived class to implement, you always want to use 'def'.

...

evaluates a + b when lv is first called/used, and saves the value, so that it is only computed once.

Use 'lazy val' and 'def' to Avoid Object Initialization Headaches

In many situations when an object is being created and initialized, if anything goes wrong the error is hard to figure out because, well, the object isn't an object yet.

...

Code Block
class myClass {

  private val isInitialized =  false

  @inline private def checkInitialized {
    assert isInitialized
  }

  private var foo_ : FooType = null

  @inline def foo = checkInitialized ; foo_

  def init {
    foo_ = ....complex calculation...
    isInitialized = true
  }

  def initErr {
    throw new InvalidStateException("not initialized")
  }

Use Typed Equality

Subtle bugs can arise when comparing a == b, when a and b turn out to be different types. The == operator is "natural" equality, which simply returns false if a and b are different types.

...

(To be determined - what is the performance of these relative to ordinary a == b.)

Attach the Source Code

Our build system will obtain the source code for libraries when sbt is able to retrieve them. If a library is not sbt-managed the library itself goes in the lib sub-directory, and the source code and documentation go into libsrc.

Having the source code to walk into from the debugger helps immensely with debugging, and makes up for some of the deficiencies of the Scala IDE support versus the more mature Java IDE support. E.g., Scala mode today doesn't pop up Javadoc strings, but if you can quickly jump over to the corresponding piece of source code, you can read the javadoc/scaladoc there.

Specifics on Libraries

Some libraries we use (or don't use) deserve specific commentary.

Library Licenses

We are committed to the Univ. of Illinois/NCSA open-source licensing terms for the Daffodil code. This restricts the licenses of libraries we use to those compatible with this license.

Generally speaking, this means we cannot use libraries licensed under the GPL (v2 or v3), but there are variations of these licenses (e.g., "classpath exception", and LGPL) which may be acceptable. These need to be examined on a case-by-case basis.

Problematic Libraries

There are some "supposedly" standard libraries that we're not using, basically because we tried and they didn't work out. Details on these efforts are below. Some day in the future this may be worth revisiting, but only if either the libraries have improved or we have someone with maintenance-level experience with them join the Daffodil project, that is, someone who knows how to make them work properly.

Apache XML Schema

This library has been tried and is inadequate to our needs currently (2012-02-24). It lacks support for non-native attributes, the support for appinfo and annotations in general is difficult to use (if it works at all), and it has no escape-mechanism by which one can bypass, get back to the XML objects themselves, and overcome its limitations.

XSOM - XML Schema Object Model

This library has been tried and we may still use it to assemble lists of schema files for us, so that it will handle the namespace resolution and include/import. But we have tried and found it unusable as far as abstract access to the DFDL Schema objects. Specifically, it does not have a first-class notion of a Schema Document. DFDL depends heavily on the notion of a Schema Document in that these are the units where lexically-scoped annotations are used. XSOM provides no way to even ask for the annotations on a schema document, so one cannot implement DFDL's lexical scoping feature using XSOM.I