Features needed
- Supports matching against binary data (not just character data)
- Implements POSIX longest leftmost match algorithm
- Uses non-backtracking algorithm to avoid exponential worst-time behavior
Survey of Java regular expression libraries
...
this page has moved to https://cwiki.apache.org/
...
Of these, the automaton library looks the most interesting from a DFA perspective.
No POSIX longest match regex library for java/scala (as far as we can tell)
Notes for implementing our own regular expression engine
...
Regular Expression Matching Can Be Simple And Fast (http://swtch.com/~rsc/regexp/regexp1.html) - Introduces several implementation techniques
Regular Expression Matching: the Virtual Machine Approach (http://swtch.com/~rsc/regexp/regexp2.html) - Discusses several implementation details, including a section on POSIX longest leftmost matching. Includes some links to some test suites for POSIX matching rules.
Regular Expression Matching in the Wild (http://swtch.com/~rsc/regexp/regexp3.html) - Tour of the re2 library (written in C++) - useful for translating the implementation to Java or Scala (which no one has apparently done yet)
...
Miscellaneous notes:
...