You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

There are a few places in our system where we don't properly identify the schema component in question.

We use some ad-hoc path-like thing that I threw together.

This needs to be made robust.

Short Schema Component Instance Designators (SSCID)

This is loosely based on the W3C Schema Component Designator spec:

http://www.w3.org/TR/xmlschema-ref/

However, this must be adapted to our needs, as it is a bit too verbose to use in diagnostic messages, doesn't have schema component instance paths, doesn't have a notion of schema document, etc.

To clarify:

  • Component: a schema component is one of the things a schema author writes in a schema.
  • Component Instance: a schema component instance is the non-shared instance of a sharable schema component, that is, in its usage context.
    • For example: a global type definition must be referenced from an element to be used. The type as it appears in the context of that element is called an 'instance' of that schema component.
  • Occurrence: in data, and the infoset, the data corresponding to a schema element declaration is called an 'occurrence' of the element. (Not to be confused with 'instance')


Subset of SCD

We use only

* relative schema component designators
* minimal set of axes which DFDL needs.

  • Note however, that we may need attributes. E.g., to refer to the maxOccurs attribute of a specific element declaration we would write e1/@maxOccurs

* only the abbreviated syntactic forms
* our own abbreviated versions of sequence, choice, and group reference path steps.
* quasi-elements for access to DFDL annotations
* convention for referring to a specific schema document (via URI)

We will use an abbreviation for 'model::sequence' and 'model::choice' as those are too verbose for our purposes. 'S' and 'C' will do, however, since these could be ambiguous with elements named 'S' and 'C' we use the official w3c verbose notation if there is any ambiguity.

Similarly we will abbreviate a group reference to group g1 as 'G::g1'

So long as what we create is easily mapped onto an official W3C SCD, then what we use can be more abbreviated. Our APIs will want to return either official W3C SCDs, or our abbreviated variant.

Schema Component Instances in Context

Our paths are schema component instance designators. These are longer paths that can reach across a reference within the schema. E.g, we need to refer to an element that has a named type, but we need to refer to things inside that instance of the type for the element that has that type. Similarly we need to reach across group references, and element references.

This is done by simply continuing the path.  E.g., Suppose element e1 has named type t1, which is a complex type with a group reference to a named group g1, containing a sequence, which contains 2 child sequences, each containing an element e2 and e3 respectively.

A Daffodil Short Schema Component Instance Designator (SSCID) corresponding to this inner e2 would be:

e1/~t1/G::g1/S/S[2]/e2

This has no direct correspondence in a w3c SCD, because XML Schema is context free; hence, there is no need to have paths that give the enclosing context. But in Daffodil, it matters greatly; hence our SSCIDs allow creation of these longer paths.

Implementation

The SchemaComponent class has some abstract methods:

def sSCIDStep: SSCIDStep

The type SSCIDStep stands for 'short SCD step'.

Final methods on SchemaComponent will assemble the complete relative Short Schema Component Instance Designator (SSCID) from the components. These are relative from the root/document element:

def sSCID: SSCID

Note that a schema component cannot create its SSCID step without knowing what its index is within its parent. E.g., the 2nd sequence child of another sequence needs to create a step with a [2] at the end.

Since these will be used in diagnostic messages, the code to create these must be minimalist in nature. Nothing can go wrong in it. It cannot throw any sort of exception, nor depend on say, OOLAG LVs. The methods which create these will catch Throwable and Assert.abort() if anything is thrown.

SSCID for DFDL Annotations

Not sure this is needed, but if we want to specify an SSCID for a specific DFDL annotation, then we use quasi-elements dfdl:format, dfdl:sequence, dfdl:simpleType, etc. That is, there is no representation of the annotation or appinfo constructs needed for long-format annotations.

The one problem is that XML Schema and w3c SCD provide no means to refer to schema documents; hence, one cannot refer to individual top-level annotations. This is the same bug we see in XSOM and other schema object models.

We solve this by allowing a URI for the schema document, followed by a URI fragment which contains the SSCID for the dfdl:format annotation.

Data Element Occurrence IDs

An occurrence of an element is identified by it's path in the infoset, the SSCID for its component instance, and a unique integer called the trip-count. The trip count increments each time the SSCID is used so that backtracking to the same path and SSCID creates unique occurrence IDs.

The trip count is represented by "(n)" where n is an integer.

When that is too verbose, parts can be omitted. For example, the path can be omitted if clear from context.

  • No labels