Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Removed stale infoset discussion. Clarify pstate is passed, not returned.

...

The alternative to this is to use a mutable state, and a parser mutates that state. Parallelism then requires the ability to create or obtain exclusive access to a separate state for use by the concurrent parse activity. This state is then passed downward to all parser code (and unparser code), but is updated in place by that code. No return of a PState object.

The runtime generally uses lots of Lists, mostly as stacks of variable bindings, backtrack locations, etc. Instead of these, one can use mutable arrays with current indexes in the state to indicate what is in use. For parallelism, allocating a new state then involves also allocating these mutable arrays.

These state objects can be kept in a pool and reused. The pool will need to be protected for thread safety. 

Alternation and Variables

...

Each backtrack point saves the pos (stack again). Successful advancing pops the stack, but then reassigns top to hold the new position value.

Infoset

A JDOM tree is inefficient as a representation no matter what we do. It holds even bit data as strings.

Introduce a variable corresponding to each xpath expression. Use DFDL setVariable at the place in the schema that the path references. The point where the xpath expression was used is replaced by a variable reference.

This eliminates any use of a xpath processor except to evaluate flat expressions like ($x + $y). This eliminates the need for JDOM, which is required due to the xpath implementation that Daffodil tries to reuse.

That enables the infoset to be built up in the inverted direction (that is, children point to parents, not the other way round, or doubly-linked as in DOM/JDOM-style trees.)

Or... children of the current infoset node can be added to an array. If the node is successfully parsed, then the array is handed off for inclusion in the infoset. Otherwise we recursively free the array and its contained children back to a infoset node array pool, which enables reuse of these node arrays.

Diagnostics

A great deal of runtime overhead is required if Daffodil is to provide good diagnostic messages when things go wrong.

...