Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1808

JPEG schema accepts too many non-JPEG data files

    XMLWordPrintableJSON

    Details

    • Type: New Feature
    • Status: Open
    • Priority: Normal
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: 2.1.0
    • Component/s: DFDL Schemas
    • Labels:
      None

      Description

      The JPEG DFDL schema has the problem of being much too permissive. Just blobs of binary data can often be accepted. The schema (to date) just identifies whether the file is any collection of JPEG segments. Alas one segment type is effectively just a datablob, so many datablobs will be accepted.

      To overcome this, additional constraint-checking is needed. This can be expressed using DFDL's dfdl:assert statements in the DFDL schema. There are two there already which enforce the first segment being a SOI segment (start of image), and the last being EOI (end of image); however, a blob of bytes between SOI and EOI would be accepted when it is clearly NOT a jpeg image.

      In some cases the constraint rules will require more expressive power than this - where true XPath query capability is required.

      The Schematron rule language could be used. See also DFDL-1807 - for schematron - in case it proves to be needed.

      Note that this is not "validation" of the data, it is using what we normally think of as a validation language, but using it for checking if the data is well-formed.

        Gliffy Diagrams

          Attachments

            Activity

              People

              Assignee:
              Unassigned Unassigned
              Reporter:
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Dates

                Created:
                Updated:

                  Tasks

                  Progress: 
                   0/0