...
However, the AO and AV tests from Tresys assume the opposite: that the lengthPattern is another form of separator/delimiter and doesn't say anything about the data format for the given field.
MikeB said: The correct definition is 'value to be matched', not a regex way to express delimiters.
In the current implementation, the separator might also become part of the pattern since it's possible to define a positive look-ahead which specifies the end of the field. For example, if the separator is a comma (,
) and the pattern is "[A-Z][a-z]*(?=,|$)"
, (meaning the next field must consist of a capital/majuscule letter followed by zero or more lower-case/minuscule letters which then terminates at the first comma or end of string.
...
I have made the necessary local changes to reverse the logic to make it a search by regex delimiter and still find the AO000, AO001, AO002, AO003, AO004, AV000, AV001 and AV002 tests are still failing due to an inability of the parser to find the initiator property.
MikeB said: Best to go to the current draft of the DFDL spec. The AA-BG tests were part of the original daffodil code base, and that was developed based on a very early draft of the DFDL standard, which has since changed a great deal as it has converged toward an agreed standard. In particular, way back somebody suggested "why not let delimiters be regular expressions?" For a while this was entertained, and that's when the original daffodil code base was created along with tests AA-BG. But this was later viewed as too much generality (hence complexity, especially when debugging "why doesn't my DFDL schema work???"), for not enough benefit.
The lengthKind="pattern" was added to handle the cases which truly need a complex match.