You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

The current implementation of lengthKind="pattern" assumes the lengthPatternexpression stores a value to be match.  That is to say if one wanted a hexadecimal string, one would set lengthPattern="[0-9A-Fa-f]+" in the current implementation.  This would still require a separator attribute though because it doesn't code for the delimiter that would normally separate fields.

However, the AO and AV tests from Tresys assume the opposite: that the lengthPattern is another form of separator/delimiter and doesn't say anything about the data format for the given field.

In the current implementation, the separator might also become part of the pattern since it's possible to define a positive look-ahead which specifies the end of the field.  For example, if the separator is a comma (,) and the pattern is "[A-Z][a-z]*(?=,|$)", (meaning the next field must consist of a capital/majuscule letter followed by zero or more lower-case/minuscule letters which then terminates at the first comma or end of string.

The testLengthKindPattern test case in sub-projects/core-srcTest/daffodil/api/TestAPI1.scala has another example of how the current implementation might be used:

lengthPattern=".*?[^\\](?=,|$)"

The above expression says match any character, zero-or-more though not more than necessary, followed by any character that isn't a backslash (/).  It then asserts that whatever follows this (the first instance of this possible match thanks to the "though not more than necessary, non-greedy" clause) is either a comma or an end of string.

If the code were changed to mean that lengthPattern is just another word for a regex separator, this would have to be changed to:

lengthPattern="(?<!\\),"

Which uses a negative look-behind to ensure that the given comma delimiter is not immediately preceded by a backslash.  The looks behinds though have to be fixed length, unlike the look-aheads due to limitations in more regex parsers, so for instance (?<=a{1,3}) is not a valid positive look-behind.

So the opened question is, which is better: should lenghtPattern subsume separator or should it specify the pattern the field may match?

Note: changing the code to support pattern is delimiter would not be that much work so time to implement should not be a consideration.

I have made the necessary local changes to reverse the logic to make it a search by regex delimiter and still find the AO000, AO001, AO002, AO003, AO004, AV000, AV001 and AV002 tests are still failing due to an inability of the parser to find the initiator property.

  • No labels