| [/ |
| / Copyright (c) 2008 Marcin Kalicinski (kalita <at> poczta dot onet dot pl) |
| / Copyright (c) 2009 Sebastian Redl (sebastian dot redl <at> getdesigned dot at) |
| / |
| / Distributed under the Boost Software License, Version 1.0. (See accompanying |
| / file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
| /] |
| [section XML Parser] |
| [def __xml__ [@http://en.wikipedia.org/wiki/XML XML format]] |
| [def __xml_parser.hpp__ [headerref boost/property_tree/xml_parser.hpp xml_parser.hpp]] |
| [def __RapidXML__ [@http://rapidxml.sourceforge.net/ RapidXML]] |
| [def __boost__ [@http://www.boost.org Boost]] |
| The __xml__ is an industry standard for storing information in textual |
| form. Unfortunately, there is no XML parser in __boost__ as of the |
| time of this writing. The library therefore contains the fast and tiny |
| __RapidXML__ parser (currently in version 1.13) to provide XML parsing support. |
| RapidXML does not fully support the XML standard; it is not capable of parsing |
| DTDs and therefore cannot do full entity substitution. |
| |
| By default, the parser will preserve most whitespace, but remove element content |
| that consists only of whitespace. Encoded whitespaces (e.g.  ) does not |
| count as whitespace in this regard. You can pass the trim_whitespace flag if you |
| want all leading and trailing whitespace trimmed and all continuous whitespace |
| collapsed into a single space. |
| |
| Please note that RapidXML does not understand the encoding specification. If |
| you pass it a character buffer, it assumes the data is already correctly |
| encoded; if you pass it a filename, it will read the file using the character |
| conversion of the locale you give it (or the global locale if you give it none). |
| This means that, in order to parse a UTF-8-encoded XML file into a wptree, you |
| have to supply an alternate locale, either directly or by replacing the global |
| one. |
| |
| XML / property tree conversion schema (__read_xml__ and __write_xml__): |
| |
| * Each XML element corresponds to a property tree node. The child elements |
| correspond to the children of the node. |
| * The attributes of an XML element are stored in the subkey [^<xmlattr>]. There |
| is one child node per attribute in the attribute node. Existence of the |
| [^<xmlattr>] node is not guaranteed or necessary when there are no attributes. |
| * XML comments are stored in nodes named [^<xmlcomment>], unless comment |
| ignoring is enabled via the flags. |
| * Text content is stored in one of two ways, depending on the flags. The default |
| way concatenates all text nodes and stores them in a single node called |
| [^<xmltext>]. This way, the entire content can be conveniently read, but the |
| relative ordering of text and child elements is lost. The other way stores |
| each text content as a separate node, all called [^<xmltext>]. |
| |
| The XML storage encoding does not round-trip perfectly. A read-write cycle loses |
| trimmed whitespace, low-level formatting information, and the distinction |
| between normal data and CDATA nodes. Comments are only preserved when enabled. |
| A write-read cycle loses trimmed whitespace; that is, if the origin tree has |
| string data that starts or ends with whitespace, that whitespace is lost. |
| [endsect] [/xml_parser] |