Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1710

Apache Tika integration

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: deferred
    • Component/s: API, Integrations
    • Labels:
      None

      Description

      Daffodil's parser could be encapsulated with the Apache Tika APIs allowing any DFDL-described format to be mined for text content in the Tika way.

      Probably this would want to be schema-aware in that Tika events would not want to be reported for numeric content, but only text content.

        Gliffy Diagrams

          Attachments

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:

                  Tasks

                  Progress: 
                   0/0