Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

DFDL provides user-visible regular expressions with a feature set that includes the intersection of the features of Java 7 regular expressions, and the regular expression features implemented by the ICU libraries.

However, this page is specifically about internal use of regular expressions by Daffodil in its implementation of delimiter matching.

Features needed

  • Supports matching against binary data (not just character data)
  • Implements POSIX longest leftmost match algorithm
  • Uses non-backtracking algorithm to avoid exponential worst-time behavior

Survey of Java regular expression libraries

...

this page has moved to https://cwiki.apache.org/

...

Of these, the automaton library looks the most interesting from a DFA perspective.

No POSIX longest match regex library for java/scala (as far as we can tell)

Notes for implementing our own regular expression engine

...

  • Regular Expression Matching Can Be Simple And Fast (http://swtch.com/~rsc/regexp/regexp1.html) - Introduces several implementation techniques

  • Regular Expression Matching: the Virtual Machine Approach (http://swtch.com/~rsc/regexp/regexp2.html) - Discusses several implementation details, including a section on POSIX longest leftmost matching.  Includes some links to some test suites for POSIX matching rules.

  • Regular Expression Matching in the Wild (http://swtch.com/~rsc/regexp/regexp3.html) - Tour of the re2 library (written in C++) - useful for translating the implementation to Java or Scala (which no one has apparently done yet)

...

Miscellaneous notes:

...

confluence/display/DAFFODIL/Regex+and+Delimiter+Matching