Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1183

Resolving URIs and Schema Locations when using Xerces to load/validate - file-relative schema locations issue


    • Icon: Bug Bug
    • Resolution: Fixed
    • Icon: Normal Normal
    • 2.0.0
    • None
    • Front End
    • None

      We use Xerces to load xml (such as TDML files), and we use Xerces XML schema stuff to explicitly validate DFDL schemas.

      In order for this to work, Xerces must be able to process the namespace prefixes, associating them to their namespaces, and then associating the namespaces with XML schemas that define them. This is done for ordinary XML files (like TDML) using xsi:schemaLocation, and for XML/DFDL schemas using that, and also using the xs:import and xs:include.

      A namespace can be associated with a schema using an XML Catalog - but that's not relevant to this issue.

      The other way is that the xsi:schemaLocation, or the schemaLocation attribute of an xs:import/xs:include, can be used in conjunction with the class path, to find the corresponding file.

      This works fine, so long as the schema location associated with the namespace is a relative path that makes sense from some directory/jar on the class path. So if ..../bin/resources is on the classpath, then a schemaLocation="xsd/foo.xsd" will search for .../bin/resources/xsd/foo.xsd, and if that's where the file is, it will find it.

      The problem comes in when we want to add file-relative schema location ability. Seems reasonable - one XML/DFDL schema file should be able to include another one in the same directory (a sibling/peer file), just by using a schemaLocation with no path part e.g.,

       <xs:include schemaLocation="myPeer.xsd"/> 

      . That works if the file containing the include statement is in a directory (in the file system, or in a jar) that is on the class path directly. The problem comes when the enclosing file is in some subdirectory of the class path.

      Example: suppose schema A.xsd lives in ..../bin/resources/xsd/A.xsd. It includes B.xsd with schemaLocation="xsd/B.xsd".

      It is now ambiguous whether B.xsd is in directory .../bin/resources/xsd/B.xsd, or .../bin/resources/xsd/xsd/B.xsd.

      Unfortunately, when Xerces calls the resolver to find the URL for B.xsd, it passes the systemID = "xsd/B.xsd", the namespace (not relevant in this case), and the BaseURI which is in this case ..../bin/resources/xsd/xsd/B.xsd.

      Why we get this seemingly incorrect BaseURI is a mystery and some further investigation is needed.

      Code in Misc.scala (in daffodil-lib) for resolving resources explicitly deals with baseURIs where there is a doubled-up directory that are at the end of the path just before the actual file name e.g., the xsd/xsd in the baseURI above.

      This is a hack. Probably does what people want.

      The missing information is this: what exactly is the URI of the enclosing file? This is not available to the resolver call. If we had the enclosing file's URI we could subtract it from the baseURI to determine that the classpath part of that is actually not just the baseURI with the filename removed, but is one directory higher than that.

              mbeckerle.dfdl Mike Beckerle
              mbeckerle.dfdl Mike Beckerle
              0 Vote for this issue
              2 Start watching this issue