Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This page is for Scala performance coding hints that should be used for performance critical code, such as the repetitive places in the Daffodil runtime module.

These ideas basically amount to writing "java-like" Scala code. 

Do not use this style except in performance critical areas, as it makes the code less readable, less compact, much harder to get correct, etc.

Hopefully in the future improvements in JVMs and the Scala compiler will make some of these techniques less necessary.

Avoid Unnecessary Allocation

Many things in Scala cause allocation of objects on the heap. This involves quite a lot of overhead to allocate the object (which has extra locations in it beyond the members), initialize memory, call the constructor, etc.

Measurements have often shown allocation to be a large cost, so there is a bunch of techniques for avoiding excess allocation.

Avoid Passing Functions - While Loops for Iteration - or Macros

Scala's map, flatmap, fold, reduce, foreach, etc. All these things take a function argument. Due to JVM issues, even though these functions are only used downward, they still end up allocated on the heap.

In general this means writing plain-old while-loops instead of Scala's much more compact idioms.

In some cases Macros can be used to create something about as compact as a scala map/fold idiom but without expressing a function object at all. See LoggerMacros.scala for examples of this.

Avoid Passing Functions - By Name Arguments - Use Macros

Code like this

Code Block
def foo (a: Int, b : => Int) = { .... }

Every time method foo is called, a little function closure is allocated for the 'b' argument passing.

A macro can often be used instead to avoid the need for the by-name argument. (See AssertMacros.scala for examples of this)

Avoid Return Objects and Tuples

These are often used to pass information back to the caller of a more complex nature, but then are discarded.

The alternative is to pass in a mutable object that is filled in by the called method. (See OnStack/LocalStack below about where that mutable object might come from.)

For the very common case of wanting to return an optional result, e.g., where you would want to return Option\[T], instead return a Maybe\[T] for objects, and use MaybeInt, MaybeLong, etc. for numbers. See below about avoiding Option type.

Similar common return types are small tuples of values, and the Either\[L, R] and Try\[T] types.

Avoid Option Type - Use Maybe Family of Types

Scala's Option type (Some, None) involves a heap-allocated object to represent the Some case. Furthermore, if you make a

Code Block
val foo: Option[Int] = Some(5)

That's two objects. Because the 5 has to be boxed so that it can appear in the generic "collection" type Some. 

We have a AnyVal-derived family of Maybe types. There are specialized variants for the unboxed types like Int

Code Block
val foo: MaybeInt = MaybeInt(5)
val bar: MaybeInt = MaybeInt.Nope

For objects, the basic

Code Block
val foo : Maybe[String] = One("foobar")

However, see below about MStack and generic collections.

Avoid Generic Collections of Unboxed Types

Code Block
val boxedInts = new ArrayBuffer(len)  // adding an Int allocates a box every time. Accessing an Int discards the box.
var unboxedInts = new Array[Int](len) // Non allocating - note var idea - in case you need to resize it.

Use MStack, avoid mutable.Stack

We need lots of stacks, and since Scala's general stacks are generic collections, we created our own non-boxing flavors:

  • MStack.Of\[T] - generic
  • MStack.OfInt - stack of Int - non-boxing
  • MStack.OfMaybe\[T] - doesn't create box for the Maybe object. Uses null for Nope, and a regular object reference for One.
    • However, MStackOfMaybe\[Int] will box and unbox the Int

Allocate on "the stack" Using OnStack and LocalStack

TBD: See the definitions of these. These make it convenient to reuse objects in recursive code, as is common in Daffodil. A small pool of reusable objects is maintained per thread. Accessing one causes it to either be created, or an existing one initialized. They get put back on exit of scope, and Scala 2.11's macros are used to avoid allocating closure objects as well.

Use Reusable Pools of Stored Objects

When reusable objects do not follow a stack discipline, then you can still reuse them by pooling them.

TBD: See Pool.scala for a common pooling idiom

 

...

this page has moved to https://cwiki.apache.org/confluence/display/DAFFODIL/Coding+for+Performance