boost_1_45_0/libs/filesystem/v2/doc/i18n.html - nest-learning-thermostat/5.0/boost - Git at Google

 <html>

 <head>
 <meta http-equiv="Content-Language" content="en-us">
 <meta name="GENERATOR" content="Microsoft FrontPage 5.0">
 <meta name="ProgId" content="FrontPage.Editor.Document">
 <meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
 <title>1.34 (Internationalization) Changes</title>
 </head>

 <body bgcolor="#FFFFFF">

 <h1>1.34 (Internationalization) Changes</h1>
 <h2>Introduction</h2>
 <p>This release is a major upgrade for the Filesystem Library, in preparation
 for submission to the C++ Standards Committee. Features of this release
 include:</p>
 <ul>
   <li><a href="#Internationalization">Internationalization</a>, provided by
   class templates <i>basic_path</i>, <i>basic_filesystem_error</i>, <i>
   basic_directory_iterator</i>, and <i>basic_directory_entry</i>.<br>
 &nbsp;</li>
   <li><a href="#Simplification">Simplification</a> of the path interface,
   including elimination of distinction between native and generic formats,
   and separation of name checking functionality from general path functionality.
   Also simplification of <i>basic_filesystem_error</i>.<br>
 &nbsp;</li>
   <li><a href="#Rationalization">Rationalization</a> of predicate function
   design, including the addition of several new functions.<br>
 &nbsp;</li>
   <li>Clearer specification by reference to [<a href="design.htm#POSIX-01">POSIX-01</a>],
   the ISO/IEEE Single Unix Standard, with provisions for Windows and other
   operating systems.<br>
 &nbsp;</li>
   <li><a href="#Preservation">Preservation</a> of existing user code whenever
   possible.<br>
 &nbsp;</li>
   <li><a href="#More_efficient">More efficient operations</a> when iterating over directories.<br>
 &nbsp;</li>
   <li>A <a href="reference.html#recursive_directory_iterator">recursive
   directory iterator</a> is now provided. </li>
 </ul>
 <p><a href="#Rationale">Rationale</a> for some of the changes is also provided.</p>
 <h2><a name="Internationalization">Internationalization</a></h2>
 <p>Cass templates <i>basic_path</i>, <i>basic_filesystem_error</i>, and <i>
 basic_directory_iterator</i> provide the basic mechanisms for
 internationalization, in ways very similar to the C++ Standard Library's <i>
 basic_string</i> and similar class templates. The following typedefs are also
 provided:</p>
 <blockquote>
   <pre>typedef basic_path&lt;std::string, ...&gt; path;
 typedef basic_path&lt;std::wstring, ...&gt; wpath;

 typedef basic_filesystem_error&lt;path&gt; filesystem_error;
 typedef basic_filesystem_error&lt;wpath&gt; wfilesystem_error;

 typedef basic_directory_iterator&lt;path&gt; directory_iterator;
 typedef basic_directory_iterator&lt;wpath&gt; wdirectory_iterator;</pre>
 </blockquote>
 <p>The string type used by Boost.Filesystem <i>basic_path</i> (std::string,
 std::wstring, or whatever) is called the <i>internal</i> string type. The string
 type used by the operating system for paths (often char*, sometimes wchar_t*) is
 called the <i>external</i> string type. Conversion between internal and external
 types is performed by path traits classes. The specific conversions for <i>path</i>
 and <i>wpath</i> is implementation defined, with normative encouragement to use
 the operating system's preferred file system encoding. For many modern POSIX-based
 file systems the <i>wpath</i> external encoding is <a href="design.htm#Kuhn">
 UTF-8</a>, while for modern Windows file systems such as NTFS it is
 <a href="http://en.wikipedia.org/wiki/UTF-16">UTF-16</a>.</p>
 <p>The <a href="reference.html#Operations-functions">operational functions</a> in
 <a href="../../../../boost/filesystem/operations.hpp">operations.hpp</a> are provided with overloads for
 <i>path</i>, <i>wpath</i>, and user-defined <i>basic_path</i>'s. A
 <a href="reference.html#Requirements-on-implementations">&quot;do-the-right-thing&quot; rule</a>
 applies to implementations, ensuring that the correct overload will be chosen.</p>
 <h2><a name="Simplification">Simplification</a> of path interface</h2>
 <p>Prior versions of the library required users of class <i>path</i> to identify
 the format (native or generic) and name error-checking policy, either via a
 second constructor argument or via a default mechanism. That approach caused
 complaints, particularly from users not needing the name checking features. The
 interface has now been simplified:</p>
 <ul>
   <li>The distinction between native and generic formats has been eliminated.
   See <a href="#distinction">rationale</a>. Two argument forms of path
   constructors are now deprecated, with the second argument having no effect.
   These constructors are only provided to ease the transition of existing code.<br>
 &nbsp;</li>
   <li>Path name checking functionality has been moved out of class path and into
   separate free-functions. This still provides name checking for those who need
   it, but with much less impact on those who don't need it.</li>
 </ul>
 <p>Additionally,
 <a href="reference.html#Class-template-basic_filesystem_error">basic_filesystem_error</a> has been put
 on a diet and generally simplified.</p>
 <p>Error codes have been moved to a separate library,
 <a href="../../../system/doc/index.html">Boost.System</a>.</p>
 <p><code>&quot;//:&quot;</code> has been introduced as a path escape prefix to identify
 native paths. Rationale: simplifies basic_path constructor interfaces, easier
 use for platforms needing explicit native format identification.</p>
 <h2><a name="Rationalization">Rationalization</a> of predicate functions</h2>
 <p>In discussions and bug reports on the Boost developers mailing list, it
 became obvious that Boost.Filesystem's exists(), symbolic_link_exists(), and
 is_directory() predicate functions were poorly specified. There were suggestions
 to add an is_accessible() function, but Peter Dimov argued that this amounted to
 papering over the lack of a clear specification and would likely lead to future
 problems.</p>
 <p>Peter suggested that an interesting way to analyze the problem was to ask
 what the expectations were for true and false values of the various predicates.
 See the <a href="#table">table</a> below.</p>
 <h3>status()</h3>
 <p>As part of the predicate discussions, particularly with Rob Stewart, it
 became obvious that sometimes applications need access to raw status information
 without any possibility of an exception being thrown. The
 <a href="reference.html#Status-functions">status()</a> function was added to meet this
 need. It also proved clearer to specify the semantics of predicate functions in
 terms of status().</p>
 <h3><a name="is_file">is_file</a>()</h3>
 <p>About the same time, Jeff Garland suggested that an
 <a href="reference.html#Predicate-functions">is_file()</a> predicate would
 compliment <a href="reference.html#Predicate-functions">is_directory()</a>. In working on the analysis below, it became obvious
 that the expectations for is_file() were different from the expectations for !is_directory(),
 so is_file() was added. </p>
 <h3><a name="is_other">is_other</a>()</h3>
 <p>On some operating systems, it is possible to have a directory entry which is
 not for either a directory or a file. The
 <a href="reference.html#Predicate-functions">is_other()</a>
 function identifies such cases.</p>
 <h3>Should predicates throw on errors?</h3>
 <p>Some conditions reported by operating systems as errors (see
 <a href="#Footnote">footnote</a>) clearly simply indicate that the predicate is
 false, rather than indicating serious failure. But other errors represent
 serious hardware or network problems, or permissions problems.</p>
 <p>Some people, particularly Rob Stewart, argue that in a function like
 <a href="reference.html#Predicate-functions">is_directory()</a>, any error should simply cause the function to return false. If
 there is actually an underlying problem, it will be detected it due course when
 a directory_iterator or fstream operation is attempted.</p>
 <p>That view is was rejected because of the following considerations:</p>
 <ul>
   <li>As a general principle, the earlier errors can be reported, the better.
   The rationale being that it is often much cheaper to fix errors sooner rather
   than later. I've also had a lot of negative experiences where failure to
   detect errors early caused a lot of pain and unhappy customers. Some of these
   were directly caused by ignoring error returns from file system operations.<br>
   &nbsp;</li>
   <li>Analysis of existing programs indicated that as much as 30% of the use of
   a predicate was not followed by directory_iterator or fstream operations on
   the path in question. Instead, the applications performed reporting or
   fall-back operations that would not fail, and thus were either misleading or
   completely wrong if the <i>false</i> return value was in fact caused by
   hardware or network failure, or permissions problems.</li>
 </ul>
 <p>However, the discussion did identify that there are valid cases where
 non-throwing behavior is a requirement, and a programmer may prefer to deal with
 file or directory attributes and errors at a very low, bit-mask, level. Function <a href="#status">status()</a>
 was proposed to meet those needs.</p>
 <h3><a name="Expectations">Expectations</a> <a name="table">table</a></h3>
 <p>In the table below, <i>p</i> is a non-empty path.</p>
 <p>Unless otherwise specified, all functions throw on hardware or general
 failure errors, permission or access errors, symbolic link loop errors, and
 invalid path errors. If an O/S fails to distinguish between error types,
 predicate operations return false on such ambiguous errors.</p>
 <p><i><b>Expectations</b></i> identify operations that are expected to succeed
 or fail, assuming no hardware, permission, or access right errors, and no race
 conditions.</p>
 <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
   <tr>
     <td width="22%" align="center"><b><i>Expression</i></b></td>
     <td width="48%" align="center"><b><i>Expectations</i></b></td>
     <td width="108%" align="center"><b><i>Semantics</i></b></td>
   </tr>
   <tr>
     <td width="22%">is_directory(p)</td>
     <td width="48%">Returns true if p is found and is a directory, else false.<br>
     If true, then directory_iterator(p) would succeed.<br>
     If false, then directory_iterator(p) would fail.</td>
     <td width="108%">Throws: if <a href="#status">status()</a> &amp; error_flag<br>
     Returns: status() &amp; directory_flag</td>
   </tr>
   <tr>
     <td width="22%">is_file(p)</td>
     <td width="48%">Returns true if p is found and is not a directory, else
     false.<br>
     If true, then ifstream(p) would succeed.<br>
     False, however, does not imply ifstream(p) would fail (because some
     operating systems allow directories to be opened as files, but stat() does
     set the &quot;regular file&quot; flag.)</td>
     <td width="108%">Throws: if status() &amp; error_flag<br>
     Returns: status() &amp; file_flag</td>
   </tr>
   <tr>
     <td width="22%">exists(p) </td>
     <td width="48%">Returns is_directory(p) || is_file(p) || is_other(p)</td>
     <td width="108%">Throws: if status() &amp; error_flag<br>
     Returns: status() &amp;&nbsp;&nbsp; (directory_flag|file_flag|other_flag)</td>
   </tr>
   <tr>
     <td width="22%">is_symlink(p)</td>
     <td width="48%">Returns true if p is found by shallow (non-transitive)
     search, and is a symbolic link, else false.<br>
     If true, and p points to q, then for any filesystem function f except those
     specified as working shallowly on symlinks themselves, f(p) calls f(q), and
     returns any value returned by f(q).</td>
     <td width="108%">Throws: if <a href="#status">symlink_status</a>() &amp;
     error_flag<br>
     Returns: symlink_status() &amp; symlink_flag</td>
   </tr>
   <tr>
     <td width="22%">!exists(p) &amp;&amp; ((p.has_branch_path() &amp;&amp; exists( p.branch_path())
     || (!p.has_branch_path() &amp;&amp; !p.has_root_path()))<br>
     <i>In other words, if the path does not exist, and (the branch does exist,
     or (there is no branch and no root)).</i></td>
     <td width="48%">If true, create_directory(p) would succeed.<br>
     If true, ofstream(p) would succeed.<br>
     &nbsp;</td>
     <td width="108%">&nbsp;</td>
   </tr>
   <tr>
     <td width="22%">directory_iterator it(p)</td>
     <td width="48%">If it != directory_iterator(), assert(exists(*it)||is_symlink(*it)).
     Note: exists(*it) may throw, and likewise status(*it) may return error_flag
     - there is no guarantee of accessibility.</td>
     <td width="108%">&nbsp;</td>
   </tr>
 </table>
 <h3><a name="Conclusion">Conclusion</a></h3>
 <p>Predicate operations is_directory(), is_file(), is_symlink(), and exists()
 with the indicated semantics form a self-consistent set that meets expectations.</p>
 <h2><a name="Preservation">Preservation</a> of existing user code</h2>
 <p>Although the change to a template based approach required a complete overhaul
 of the implementation code, the  interface as used by existing applications is mostly unchanged.
 Conversion problems which would
 otherwise affect user code have been reduced by providing deprecated
 functions to ease transition. The deprecated functions are:</p>
 <blockquote>
   <pre>// class basic_path - 2nd constructor argument ignored:
 basic_path( const string_type &amp; str, name_check );
 basic_path( const typename string_type::value_type * s, name_check );

 // class basic_path - old names provided for renamed functions:
 string_type native_file_string() const;
 string_type native_directory_string() const;

 // class basic_path - now defined such that these no longer have any real effect:
 static bool default_name_check_writable() { return false; }
 static void default_name_check( name_check ) {}
 static name_check default_name_check() { return 0; }

 // non-deducible operations functions assume class path
 inline path current_path()
 inline const path &amp; initial_path()

 // the new basic_directory_entry provides leaf()
 // to cover the common existing use case itr-&gt;leaf()
 typename Path::string_type leaf() const;</pre>
 </blockquote>
 <p>If you do not want  the deprecated functions to be included, define the macro BOOST_FILESYSTEM_NO_DEPRECATED.</p>
 <p>The greatest impact on existing code is the change of directory iterator
 value type from <code>path</code> to <code>directory_entry</code>. To ease the
 most common directory iterator use case, <code>basic_directory_entry</code>
 provides an automatic conversion to <code>basic_path</code>, and this also
 serves to prevent breakage of a lot of existing code. See the
 <a href="#More_efficient">next section</a> for discussion of rationale.</p>
 <blockquote>
   <pre>// the new basic_directory_entry provides:
 operator const path_type &amp;() const;</pre>
   </blockquote>
 <h2><a name="More_efficient">More efficient</a> operations when iterating over
 directories</h2>
 <p>Several common real-world operating systems (BSD derivatives, Linux, Windows)
 provide status information during directory iteration. Caching of this status
 information results in three to six times faster operation for typical predicate
 operations. (For a directory containing 15,047 files, iteration in 1 second vs 6
 seconds on a freshly booted system, and 0.3 seconds vs 0.9 seconds after prior use of
 the directory.</p>
 <p>The efficiency gains from caching such status information were considered too
 significant to ignore. Because the possibility of race-conditions differs
 depending on whether the cached information is used or an actual system call is
 performed, it was considered necessary to provide explicit functions utilizing
 the cached information, rather than implicitly using the cache behind the
 scenes.</p>
 <p>Three options were explored for exposing the cached status information, with
 full implementations of each. After initial implementation of option 1 exposed
 the problems noted below, option 2 was tested as a possible engineering
 tradeoff. Option 3
 was finally chosen as the cleanest design.</p>
 <table border="1" cellpadding="5" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="100%">
   <tr>
     <td width="8%" align="center"><b><i>Option</i></b></td>
     <td width="25%" align="center"><i><b>How cache accessed</b></i></td>
     <td width="94%" align="center"><i><b>Pros and Cons</b></i></td>
   </tr>
   <tr>
     <td width="8%" valign="top" align="center"><i><b>1</b></i></td>
     <td width="25%" valign="top">Predicate function overloads<br>
     (basic_directory_iterator value_type is path)</td>
     <td width="94%">
     <ul>
       <li>Very Questionable design (friendship abuse, overload abuse, etc)</li>
       <li>User cannot reuse cache</li>
       <li>Readability problem; easy to miss difference between f(*it) and f(it)</li>
       <li>Write-ability problem (error prone?)</li>
       <li>Most common iterator use is brief: *it</li>
       <li>Preserves existing code</li>
     </ul>
     </td>
   </tr>
   <tr>
     <td width="8%" valign="top" align="center"><b><i>2</i></b></td>
     <td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">iterator</span><br>
     (basic_directory_iterator value_type is path)</td>
     <td width="94%">
     <ul>
       <li>Somewhat cleaner design (although added iterator functions is unusual)</li>
       <li>User cannot reuse cache</li>
       <li>Readability and write-ability is OK: f(*it) and it.f() sufficiently
       different</li>
       <li>Most common iterator use is brief: *it</li>
       <li>Preserves existing code</li>
     </ul>
     </td>
   </tr>
   <tr>
     <td width="8%" valign="top" align="center"><b><i>3</i></b></td>
     <td width="25%" valign="top">Predicate member functions of basic_directory_<span style="background-color: #FFFF00">entry</span><br>
     (basic_directory_iterator value_type is basic_directory_entry)<br>
 &nbsp;</td>
     <td width="94%">
     <ul>
       <li>Cleanest design.</li>
       <li>User can reuse cache.</li>
       <li>Readability and write-ability is OK: f(*it) and it-&gt;f() sufficiently
       different.</li>
       <li>Most common iterator use is longer: it-&gt;path(), but by providing
       &quot;operator const basic_path &amp;&quot; it is still possible to write a bare *it.</li>
       <li>Breaks some existing code. The &quot;operator const basic_path &amp;&quot;
       conversion eliminates breakage of the most common use case, while
       providing a (deprecated) leaf() prevents breakage of the second most
       common use case.</li>
     </ul>
     </td>
   </tr>
   </table>
 <h2><a name="Rationale">Rationale</a></h2>
 <h3>Elimination of the native versus generic <a name="distinction">distinction</a></h3>
 <p>Elimination of user confusion and general design simplification was the
 original motivation for elimination of the distinction between native and
 generic paths.</p>
 <p>During design work, a further technical argument was discovered. Consider the
 path <code>&quot;c:foo/bar&quot;</code>. On many POSIX systems, <code>&quot;c:foo&quot;</code> is a
 valid directory name, so we have a two element path and there is no issue of
 native versus generic format. On Windows system, however, <code>&quot;c:&quot;</code> is a
 drive specification, so we have a three element path. All calls to the operating
 system will result in <code>&quot;c:&quot;</code> being considered a drive specification;
 there is no way that fact-of-life can be changed by claiming the format is
 generic. The native versus generic distinction is thus useless and misleading
 for POSIX, Windows, and probably most other operating systems.</p>
 <p>If paths for a particular operating system did require a distinction be made,
 it could be done by requiring that native paths be prefixed with some unique
 implementation-defined identification. For example, <code>&quot;native-path:&quot;</code>.
 This would only be required for operating systems where (1) the distinction
 mattered, and (2) there was no lexical way to distinguish the two forms. For
 example, a native operating system that used the same syntax as the Filesystem
 Library's generic POSIX-like format, but processed the elements right-to-left
 instead of left-to-right.</p>
 <h3>Preservation of <a name="existing-code">existing code</a></h3>
 <p>Allowing existing user code to continue to work with the updated version of
 the library has obvious benefits in terms of preserving the effort users have
 applied to both learning the library and writing code which uses the library.</p>
 <p>There is an additional motivation; other than the name checking portion of
 class path,&nbsp; the existing interface has proven to be useful and robust, so
 there is no reason to fiddle with it.</p>
 <h3><a name="Single_path_design">Single path design</a></h3>
 <p>During preliminary internationalization discussion on the Boost developer's
 list, a design was considered for a single path class which could hold either
 narrow or wide character based paths. That design was rejected because:</p>
 <ul>
   <li>The design was, for many applications, an over-generalization with runtime
   memory and speed costs which would have to be paid for even when not needed.<br>
 &nbsp;</li>
   <li>There was concern that the design would be confusing to users, given that
   the standard library already uses single-value-type strings, rather than
   strings which morph value types as needed.<br>
 &nbsp;</li>
   <li>There were technical issues with conversions when a narrow path was
   appended to a wide path, and visa versa. The concern was that double
   conversions could cause incorrect results, that conversions best left to the
   operating system would be performed, and that the technical complexity was too
   great in relation to perceived benefits. User-defined types would only make
   the problem worse.<br>
 &nbsp;</li>
 </ul>
 <h3>No versions of <a href="reference.html#Status-functions">status()</a> which throw exceptions on
 errors</h3>
 <p>The rationale for not including versions of status()
 which throw exceptions on errors is that (1) the primary purpose of this
 function is to perform queries at a very low-level, where exceptions are usually
 unwanted, and (2) exceptions on errors are already provided by the predicate
 functions. There would be little or no efficiency gain from providing a throwing
 version of status().</p>
 <h3>Symlink identifying version of <a href="reference.html#Status-functions">status()</a> function</h3>
 <p>A symlink identifying version of the status() function is distinguished by a
 second argument. Often separately named functions are more appropriate than
 overloading when behavior
 differs, which is the case here, while overloads are more appropriate when
 behavior is the same but argument types differ (Iain Hanson). Overloading was
 chosen in this particular case because a subjective judgment that a single
 function name with an optional &quot;symlink&quot; second argument produced more
 understandable code. The original implementation of the function used the name &quot;symlink_status&quot;,
 but that just didn't read right in real code.</p>
 <h3>POSIX wpath_traits defaults to locale(&quot;&quot;), but allows imbuing of locale</h3>
 <p>Vladimir Prus pointed out that for Linux (and presumably other POSIX
 operating systems) that need to convert wide character paths to narrow
 characters, the default conversion should not depend on the operating system
 alone, but on the std::locale(&quot;&quot;) default. For example, the usual encoding
 for Russian on Linux (and Russian web sites) is KOI8-R (RFC1489). The ability to safely specify a different locale
 is also provided, to meet unforeseen needs.</p>
 <hr>
 <p>Revised
 <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->18 March, 2008<!--webbot bot="Timestamp" endspan i-checksum="29005" --></p>
 <p>© Copyright Beman Dawes, 2005</p>
 <p>Distributed under the Boost Software License, Version 1.0.
 (See accompanying file <a href="../../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
 copy at <a href="http://www.boost.org/LICENSE_1_0.txt">www.boost.org/LICENSE_1_0.txt</a>)</p>

 </body>

 </html>