Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1700

class path collisions in DFDL XSD schemaLocation

XMLWordPrintableJSON

      Recently we tried to take the jars for a number of DFDL schemas and use them all together in one application.

      Two entirely different schemas both have a schemaLocation on an import that references the same file name.

      E.g., schemaLocation="xsd/defaultFormat.dfdl.xsd"

      These have entirely different contents for the two formats, so mixing them up breaks things.

      The problem is, when using these together, the jars all go on the class path, and when this file is retrieved, it is taken from the classpath, so whatever jar is first "wins".

      There are two issues here:

      1. Best practice is to use relative paths from the current file location, not classpath relative. So if this schema's files are all in the same xsd subdir of src/main/resources, then this should be schemaLocation="defaultFormat.dfdl.xsd", that is, without the "xsd/" prefix.
      2. That has to work, and a quick peruse of the resolver code it isn't clear that importing-file-relative resolution is preferred to classpath relative. If that isn't happening it's a pretty big bug. (If that is happening correctly, then this ticket should be changed from "bug" to "improvement"). Either way we need tests that verify that include/import don't mix up files across the class path unnecessarily.

      All our DFDL schemas probably are using this "xsd/filename.dfdl.xsd" convention for schemaLocation currently, so really they should all be updated, or scanned for this problem, and changed to use self-relative paths instead.

      Global things that many schemas include/import also should NOT use simple classpath relative names, since name collisions can occur.

      Example:

      <xs:include schemaLocation="xsd/built-in-formats.dfdl.xsd"/>


      This effectively takes over the name built-in-formats.dfdl.xsd globally. The right thing is for that file to live under src/main/resources/edu.illinois.ncsa.daffodil/xsd/built-in-formats.dfdl.xsd, and to be referenced like this

      <xs:include schemaLocation="edu.illinois.ncsa.daffodil/xsd/built-in-formats.dfdl.xsd"/>


      This is effectively the same way of solving namespace collsions as Java uses as suggested practice for Java packages.

      This makes it clear you want Daffodil's built-in-formats.dfdl.xsd, not some other one e.g., that might be part of your schema's files.

      So best practice is:

      • use a self-relative schemaLocation path if the file being import/include is part of your schema
      • create package-style directory names under src/main/resources so that references to schema files that are classpath relative are unambiguous

      You could also use XML Catalog to straighten this stuff out, but really it is better to make it possible for the basic classpath inclusion system to just work.

              dthompson David Thompson
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: