Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

We use the XML <![CDATA[ ... ]]> bracketing several different ways. Some are problematic, and as we spot such usage in our TDML tests, we should remove it and replace with more robust usage.

We load TDML files differently than the way we load other XML files or DFDL schemas because of the way it treats these CDATA regions. This is very undesirable, as it introduces a possibility of different diagnostic behavior on validation errors, doesn't do line-numbering right in diagnostic messages, etc

Ways we use CDATA that are legit....

The problem is that CDATA regions are not "preserve exactly what is here". Rather, they are just a different way of being able to avoid having to escape the & and < characters. XML's general fungible whitespace behavor stuff still applies.

OK: To preserve textual formatting within TDML - for clarity reasons.

E.g.,

<tdml:documentPart type="byte"><![CDATA[
00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f              
10 11 12 13 14 15 16 17 18 19 1a 1b 1c 1d 1e 1f
20 21    23 24 25    27 28 29 2a 2b 2c 2d 2e 2f
30 31 32 33 34 35 36 37 38 39 3a 3b    3d    3f
40 41 42 43 44 45 46 47 48 49 4a 4b 4c 4d 4e 4f
50 51 52 53 54 55 56 57 58 59 5a 5b 5c 5d 5e 5f
60 61 62 63 64 65 66 67 68 69 6a 6b 6c 6d 6e 6f
70 71 72 73 74 75 76 77 78 79 7a 7b 7c 7d 7e 7f
80 81 82 83 84 85 86 87 88 89 8a 8b 8c 8d 8e 8f
90 91 92 93 94 95 96 97 98 99 9a 9b 9c 9d 9e 9f
a0 a1 a2 a3 a4 a5 a6 a7 a8 a9 aa ab ac ad ae af
b0 b1 b2 b3 b4 b5 b6 b7 b8 b9 ba bb bc bd be bf
c0 c1 c2 c3 c4 c5 c6 c7 c8 c9 ca cb cc cd ce cf
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9 da db dc dd de df
e0 e1 e2 e3 e4 e5 e6 e7 e8 e9 ea eb ec ed ee ef
f0 f1 f2 f3 f4 f5 f6 f7 f8 f9 fa fb fc fd fe ff  
]]></tdml:documentPart>

The above matrix of hex would be hard to understand specifically, where those holes in it are, without the formatting, but logically, the whitespace is irrelevant. In effect, we have CDATA here so that tooling like IDEs, XML editor, etc. will not mess with the formatting of the content.

OK: As a clearer way to escape things than using &amp; &gt; &lt; &apos;.

E.g.,

<foo>abc<![CDATA[&&&]]>def&#xE000;ghi</foo>

...

This will have only one Text node in it. We handle this by clever comparison routines that are used when XML is compared. These special purpose routines also do things such as ignoring the namespace prefixes on element names. (Probably undesirable longer term.)

OK: To avoid insertion of whitespace that would make things incorrect.

For example, here we need the document to contain exactly and only two characters:

...

In the above case, since we really do care about whitespace being inserted here, we use CDATA.

...

NOT

...

OK: To preserve specific line endings

Using CDATA does NOT preserve line endings (necessarily). So if you had a test where you have this:

...