How would I construct a DFDL element to read in a stream of binary data until a terminating sequence of bytes occurs. eg a stream is terminated with 0xFFDA, 0xFFC0 or 0xFFD9?
I was able to partially achieve my aim (i.e. terminate on a single terminatorr) with the following:
<xs:element name="foo" type="xs:hexBinary" dfdl:length="delimited" dfdl:outputNewLine="%CR;%LF;" dfdl:terminator="%#rff;%#rda;" />
I have no idea why I had to set outputNewLine, but Daffodil would throw an error if not set.
I suspect you meant dfdl:lengthKind="delimited"? dfdl:length is only used when dfdl:lengthKind="explicit", in which case the value of dfdl:length must be either a number of a DFDL expression.
The issue with dfdl:lengthKind="delimited" is that the 0xFFXX delimiters will be consumed by the data. When parsing JPEG (which I assume this is what you're discussing) you often will need to use those 0xFFXX markers to determine how to parse the following data. dfdl:lengthKind="delimited" will consume those delimiters so you cannot do that.
What you really want is something like dfdl:lengthKind="pattern", and you specify a regular expression that will consume everything up to those special 0xFFXX markers.
When working with others who are working on creating a DFDL schema for JPEG, we recommended something like the following:
<xs:element name="foo" type="xs:hexBinary" dfdl:lengthKind="pattern" dfdl:lengthPattern=".*?(?=\xFF[\x01-\xFE])" dfdl:encoding="ISO-8859-1" />
This specifies a pattern to consume all data as xs:hexBinary up until one of the 0xFFXX markers, but will not consume the 0xFFXX marker itself. You can then follow this with an element to consume the 2-byte marker followed by a choice that discriminates on the value of that marker to determine how to continue processing.
Also, note that we literally just added support for dfdl:lengthKind="pattern" with xs:hexBinary types today (Nov 23, 2016), so you'll need the latest 2.0.0-SNAPSHOT of Daffodil for this to work.
Minor update to the above. The dot (.) in the dfdl:lengthPattern property does not match newlines. So if the hexBinary data contains newlines, it will fail to match. The correct pattern should be:
I successfully used Daffodil-2.0.0-SNAPSHOT to extract multiple JPEG images from a NITF file with the caveat that I could only extract SOI, FRAME (as a blob) and EOI info.
A JPEG FRAME contains multiple fragments delimited by marker codes. The existence and location of the fragments is not fixed.
I tried creating a FRAME element with a length calculated using the lengthPattern property, and then embedding multiple fragment elements. Each fragment element uses a lengthPattern property to detect the end of fragment and lengthKind = "endOfParent" to determine when to stop reading fragments for a frame. .
Daffodil reported an error that element FRAME and children must have text representation in order for pattern-based length and scanability. I tried setting the encoding to ISO-8859-1, but I could not to fix this issue - maybe a rookie error?
dfdl:lengthKind="pattern" is only supported on complex types when the text is "scannable". By scannable, we mean the dfdl:representation="text" and the dfdl:encoding of all children are the same and known at schema compile time (i.e. not an expression). Since dfdl:representation="binary", dfdl:lengthKind="pattern" is not supported, and you will get a compile time error.
Also note that dfdl:lengthKind="endOfParent" is not yet implemented in Daffodil. The fact that Daffodil does not report this as an error is a bug. Currently, if you specify dfdl:lengthKind="endOfParent" on a complex type, it looks like Daffodil just treats it as if it is dfdl:lengthKind="implicit". DFDL-1664 has been created to track this issue.
I suspect the solution to your problem is to move the dfdl:lengthKind="pattern" and dfdl:length="pattern" to a simple type with type xs:hexBinary.
Powered by a free Atlassian Confluence Open Source Project License granted to NCSA OpenSource. Evaluate Confluence today.