Uploaded image for project: 'Daffodil'
  1. Daffodil
  2. DFDL-1103

API should allow specifying the temp directory to be used.

XMLWordPrintableJSON

    • Icon: Task Task
    • Resolution: Fixed
    • Icon: Major Major
    • 1.0.0
    • s15
    • API
    • None

      Per thread below, if we're using temp files (which we currently are), then the API should provide a way to specify the temp dir for them.

      Changing the code base to avoid use of temp files entirely is a separate issue. The concern here is having control over the tempdir to be used.


      The API should be a able to operate equally with data AND schemas residing entirely in memory, enitirely, or some combination.

      Use of temp should be minimized to so as not to impact performance. But being able to specify a temp location probably would be good


      As someone who is on the outside of your API looking in, I don't get to decide how your API does things. I just get to use it. It's a black box. Your approach does "seem wasteful," but if you have real reasons then you have real reasons. I'm ok with that.

      Question: Assume that your software will be deployed on a highly constrained system, one where writing to the file system is tightly controlled. A user who passes everything by memory might not realize/expect that you will be writing to the file system, thus he may place your code in a location where you won't have write privs. Your code might not work in such an environment. Would it be a good idea allow the API to specify a temp file location? This allows the user to designate somewhere on the system where read and write privs are less restrictive.

      can you clarify whether or not the concern in my question is valid?


      For convenience we could accept schemas as memory objects and pretty print them to our own temp files of course. That's the quick and dirty fix.

      So question: Is there a reason this would not suffice other than "seems wasteful?".

      I ask because we get a lot of leverage (i.e., have to write much less code) from the fact that DFDL schemas come from files.

      We actually take the files provided, construct a "bootstrap" xml object which is a DFDL schema containing only xs:import statements, so it imports the schemas provided on the command line (or to the API). Then this single bootstrap schema document starts the whole induction across the imports/includes of all the files that make up the overall schema. Those files come from the file system, or from within jar files. There are a few benefits of this uniformity. The most obvious is that there is exactly one code path for how a schema is obtained by daffodil - by an import/include of a schema document, period. So one way that we deal with schemaLocation hints, one way to deal with the XML catalog resolver, etc. Another of the things we get from this is that all the DFDL schemas are loaded into memory using a specific loader which augments every XML element with the file-line-column number information which we use in diagnostic messages. There's actually quite a few more benefits - I won't bore you with any further enumeration.

      So our use of files is definitely not an API design decision to take files rather than something more general like streams or strings or jdom trees. There' some real code-economy at work here.

      So would a "quick-and-dirty" - puts the schemas into temp files for you (and cleans them up), kind of wrapper be sufficient?


      As far as I can tell, when preparsing, the Java API for Daffodil requires an array of File objects, like so:

      Compiler compiler = edu.illinois.ncsa.daffodil.japi.Daffodil.compiler();
      compiler.setDistinguishedRootNode(rootElement, namespace);
      ProcessorFactory processorFactory = compiler.compile(schemaFiles); // schemaFiles is a File[]

      Unfortunately not every API/system processes data by reading and writing to the local drive. Can you please add a convenience method that allows compile() to take String or an array of Strings for an input, where the Strings are the contents are the DFDL schema (not a path to the schema)? This would be a big help.

      Thanks,

      --J

              efinnegan Elizabeth Finnegan
              mbeckerle.dfdl Mike Beckerle
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved:

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - Not Specified
                  Not Specified
                  Logged:
                  Time Spent - 34 minutes
                  34m