Loading...

XML

Word

Printable

JSON

Type: Improvement
Resolution: Unresolved
Priority: Minor
Fix Version/s: deferred
Affects Version/s: 2.1.0
Component/s: Back End, Performance
Labels:
None

Profiling has shown that on unparse, remapping from PUA to XML is actually fairly intensive. Removing this remap (for a schema that doesn't need to remap anything) improves performance by about 30%.

And fortunately, there are probably a lot of cases where we know we don't need to remap. For example, xs:hexBinary, integer types, and date/time types should never require mapping to/from PUA since they are always representing in the infoset with ASCII chars. Really, the only thing that might need it is xs:strings.

So potentially a few ideas for performance improvements:

Never remap types that we know will never have XML illegal characters
For types that could potentially have XML illegal characters, first check if there are any illegal characters before remapping the string. In most cases, we won't need to remap, so this will save us the costs associated with string builders. This does mean things might be a little slower for strings that contain illegal XML characters, but that's not the common case.
When we do find a string containing XML illegal characters, let's put an attribute in the infoset that indicates that the data was mapped to PUA. This way, when we unparse, we only ever have to remap strings to XML if that attribute is set. This may also be helpful for users, since this could be a notice that they need to remap the string before using it. Note, however, that we might want a tunable that says to always remap xs:strings when unparsing, even if the attribute doesn't exist, since the infoset may not have come from a Daffodil parse or may have been sanitized and had attributes removed.

Assignee:: Unassigned

Reporter:: Steve Lawrence

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 19/Dec/16 12:20 PM

Updated:: 15/Sep/17 12:55 PM

Details

Description

Gliffy Diagrams

Attachments

Activity

People

Dates

Tasks