Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Features needed

  • Supports matching against binary data (not just character data)
  • Implements POSIX longest leftmost match algorithm
  • Uses non-backtracking algorithm to avoid exponential worst-time behavior

Survey of Java regular expression libraries

...

this page has moved to https://cwiki.apache.org/

...

Of these, the automaton library looks the most interesting from a DFA perspective.

No POSIX longest match regex library for java/scala (as far as we can tell)

Notes for implementing our own regular expression engine

...

  • Regular Expression Matching Can Be Simple And Fast (http://swtch.com/~rsc/regexp/regexp1.html) - Introduces several implementation techniques

  • Regular Expression Matching: the Virtual Machine Approach (http://swtch.com/~rsc/regexp/regexp2.html) - Discusses several implementation details, including a section on POSIX longest leftmost matching.  Includes some links to some test suites for POSIX matching rules.

  • Regular Expression Matching in the Wild (http://swtch.com/~rsc/regexp/regexp3.html) - Tour of the re2 library (written in C++) - useful for translating the implementation to Java or Scala (which no one has apparently done yet)

...

Miscellaneous notes:

...

confluence/display/DAFFODIL/Regex+and+Delimiter+Matching