| <HTML> |
| <!-- |
| Copyright (c) Jeremy Siek 2000 |
| |
| Distributed under the Boost Software License, Version 1.0. |
| (See accompanying file LICENSE_1_0.txt or copy at |
| http://www.boost.org/LICENSE_1_0.txt) |
| --> |
| <Head> |
| <Title>Sparse Matrix Ordering Example</Title> |
| <BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b" |
| ALINK="#ff0000"> |
| <IMG SRC="../../../boost.png" |
| ALT="C++ Boost" width="277" height="86"> |
| |
| <BR Clear> |
| |
| <H1><A NAME="sec:sparse-matrix-ordering"></A> |
| Sparse Matrix Ordering |
| </H1> |
| |
| <P> |
| Graph theory was identified as a powerful tool for sparse matrix |
| computation when Seymour Parter used undirected graphs to model |
| symmetric Gaussian elimination more than 30 years ago [<A HREF="bibliography.html#parter61:_gauss">28</A>]. |
| Graphs can be used to model symmetric matrices, factorizations and |
| algorithms on non-symmetric matrices, such as fill paths in Gaussian |
| elimination, strongly connected components in matrix irreducibility, |
| bipartite matching, and alternating paths in linear dependence and |
| structural singularity. Not only do graphs make it easier to |
| understand and analyze sparse matrix algorithms, but they broaden the |
| area of manipulating sparse matrices using existing graph algorithms |
| and techniques [<A |
| HREF="bibliography.html#george93:graphtheory">13</A>]. In this section, we are |
| going to illustrate how to use BGL in sparse matrix computation such |
| as ordering algorithms. Before we go further into the sparse matrix |
| algorithms, let us take a step back and review a few things. |
| |
| <P> |
| |
| <H2>Graphs and Sparse Matrices</H2> |
| |
| <P> |
| A graph is fundamentally a way to represent a binary relation between |
| objects. The nonzero pattern of a sparse matrix also describes a |
| binary relation between unknowns. Therefore the nonzero pattern of a |
| sparse matrix of a linear system can be modeled with a graph |
| <I>G(V,E)</I>, whose <I>n</I> vertices in <I>V</I> represent the |
| <I>n</I> unknowns, and where there is an edge from vertex <I>i</I> to |
| vertex <I>j</I> when <i>A<sub>ij</sub></i> is nonzero. Thus, when a |
| matrix has a symmetric nonzero pattern, the corresponding graph is |
| undirected. |
| |
| <P> |
| |
| <H2>Sparse Matrix Ordering Algorithms</H2> |
| |
| <P> |
| The process for solving a sparse symmetric positive definite linear |
| system, <i>Ax=b</i>, can be divided into four stages as follows: |
| <DL> |
| <DT><STRONG>Ordering:</STRONG></DT> |
| <DD>Find a permutation <i>P</i> of matrix <i>A</i>, |
| </DD> |
| <DT><STRONG>Symbolic factorization:</STRONG></DT> |
| <DD>Set up a data structure for the Cholesky |
| factor <i>L</i> of <i>PAP<sup>T</sup></i>, |
| </DD> |
| <DT><STRONG>Numerical factorization:</STRONG></DT> |
| <DD>Decompose <i>PAP<sup>T</sup></i> into <i>LL<sup>T</sup></i>, |
| </DD> |
| <DT><STRONG>Triangular system solution:</STRONG></DT> |
| <DD>Solve <i>LL<sup>T</sup>Px=Pb</i> for <i>x</i>. |
| </DD> |
| </DL> |
| Because the choice of permutation <i>P</i> will directly determine the |
| number of fill-in elements (elements present in the non-zero structure |
| of <i>L</i> that are not present in the non-zero structure of |
| <i>A</i>), the ordering has a significant impact on the memory and |
| computational requirements for the latter stages. However, finding |
| the optimal ordering for <i>A</i> (in the sense of minimizing fill-in) |
| has been proven to be NP-complete [<a |
| href="bibliography.html#garey79:computers-and-intractability">30</a>] |
| requiring that heuristics be used for all but simple (or specially |
| structured) cases. |
| |
| <P> |
| A widely used but rather simple ordering algorithm is a variant of the |
| Cuthill-McKee orderings, the reverse Cuthill-McKee ordering algorithm. |
| This algorithm can be used as a preordering method to improve ordering |
| in more sophisticated methods such as minimum degree |
| algorithms [<a href="bibliography.html#LIU:MMD">21</a>]. |
| |
| <P> |
| |
| <H3><A NAME="SECTION001032100000000000000"> |
| Reverse Cuthill-McKee Ordering Algorithm</A> |
| </H3> |
| The original Cuthill-McKee ordering algorithm is primarily designed to |
| reduce the profile of a matrix [<A |
| HREF="bibliography.html#george81:__sparse_pos_def">14</A>]. |
| George discovered that the reverse ordering often turned out to be |
| superior to the original ordering in 1971. Here we describe the |
| Reverse Cuthill-McKee algorithm in terms of a graph: |
| |
| <OL> |
| <LI><I>Finding a starting vertex</I>: Determine a starting vertex |
| <I>r</I> and assign <i>r</i> to <I>x<sub>1</sub></I>. |
| </LI> |
| <LI><I>Main part</I>: For <I>i=1,...,N</I>, find all |
| the unnumbered neighbors of the vertex <I>x<sub>i</sub></I> and number them |
| in increasing order of degree. |
| </LI> |
| <LI><I>Reversing ordering</I>: The reverse Cuthill-McKee ordering |
| is given by <I>y<sub>1</sub>,...,y<sub>N</sub></I> where |
| <I>y<sub>i</sub></I> is <I>x<sub>N-i+1</sub></I> for <I>i = 1,...,N</I>. |
| </LI> |
| </OL> |
| In the first step a good starting vertex needs to be determined. The |
| study by George and Liu [<A |
| HREF="bibliography.html#george81:__sparse_pos_def">14</A>] showed that |
| a pair of vertices which are at maximum or near maximum ``distance'' |
| apart are good ones. They also proposed an algorithm to find such a |
| starting vertex in [<A |
| HREF="bibliography.html#george81:__sparse_pos_def">14</A>], which we |
| show in <A |
| HREF="#fig:ggcl_find_starting_vertex">Figure |
| 1</A> implemented using the BGL interface. |
| |
| <P> |
| <P></P> |
| <DIV ALIGN="CENTER"><A NAME="fig:ggcl_find_starting_vertex"></A> |
| <table border><tr><td> |
| <pre> |
| namespace boost { |
| template <class Graph, class Vertex, class Color, class Degree> |
| Vertex |
| pseudo_peripheral_pair(Graph& G, const Vertex& u, int& ecc, |
| Color color, Degree degree) |
| { |
| typename property_traits<Color>::value_type c = get(color, u); |
| |
| rcm_queue<Vertex, Degree> Q(degree); |
| |
| typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; |
| for (tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) |
| put(color, *ui, white(c)); |
| breadth_first_search(G, u, Q, bfs_visitor<>(), color); |
| |
| ecc = Q.eccentricity(); |
| return Q.spouse(); |
| } |
| |
| template <class Graph, class Vertex, class Color, class Degree> |
| Vertex find_starting_node(Graph& G, Vertex r, Color c, Degree d) { |
| int eccen_r, eccen_x; |
| Vertex x = pseudo_peripheral_pair(G, r, eccen_r, c, d); |
| Vertex y = pseudo_peripheral_pair(G, x, eccen_x, c, d); |
| |
| while (eccen_x > eccen_r) { |
| r = x; |
| eccen_r = eccen_x; |
| x = y; |
| y = pseudo_peripheral_pair(G, x, eccen_x, c, d); |
| } |
| return x; |
| } |
| } // namespace boost |
| </pre> |
| </td></tr></table> |
| <CAPTION ALIGN="BOTTOM"> |
| <table><tr><td> |
| <STRONG>Figure 1:</STRONG> |
| The BGL implementation of <TT>find_starting_node</TT>. The key |
| part <TT>pseudo_peripheral_pair</TT> is BFS with a custom queue |
| type virtually. |
| </td></tr></table></CAPTION> |
| </DIV> |
| <P></P> |
| |
| The key part of step one is a custom queue type with BFS as shown in |
| <A |
| HREF="#fig:ggcl_find_starting_vertex">Figure |
| 1</A>. The main Cuthill-McKee algorithm shown in Figure <A |
| HREF="#fig:cuthill-mckee">Figure 2</A> |
| is a fenced priority queue with a generalized BFS. If |
| <TT>inverse_permutation</TT> is a normal iterator, the result stored |
| is the inverse permutation of original Cuthill-McKee ordering. On the |
| other hand, if <TT>inverse_permutation</TT> is a reverse iterator, the |
| result stored is the inverse permutation of reverse Cuthill-McKee |
| ordering. Our implementation is quite concise because many components |
| from BGL can be reused. |
| |
| |
| <P></P> |
| <DIV ALIGN="CENTER"><A NAME="fig:cuthill-mckee"></A><A NAME="6261"></A> |
| <table border><tr><td> |
| <pre> |
| template < class Graph, class Vertex, class OutputIterator, |
| class Color, class Degree > |
| inline void |
| cuthill_mckee_ordering(Graph& G, |
| Vertex s, |
| OutputIterator inverse_permutation, |
| Color color, Degree degree) |
| { |
| typedef typename property_traits<Degree>::value_type DS; |
| typename property_traits<Color>::value_type c = get(color, s); |
| typedef indirect_cmp<Degree, std::greater<DS> > Compare; |
| Compare comp(degree); |
| fenced_priority_queue<Vertex, Compare > Q(comp); |
| |
| typedef cuthill_mckee_visitor<OutputIterator> CMVisitor; |
| CMVisitor cm_visitor(inverse_permutation); |
| |
| typename boost::graph_traits<Graph>::vertex_iterator ui, ui_end; |
| for (tie(ui, ui_end) = vertices(G); ui != ui_end; ++ui) |
| put(color, *ui, white(c)); |
| breadth_first_search(G, s, Q, cm_visitor, color); |
| } |
| </pre> |
| </td></tr></table> |
| <CAPTION ALIGN="BOTTOM"> |
| <TABLE> |
| <TR><TD> <STRONG>Figure 2:</STRONG> The BGL implementation of |
| Cuthill-McKee algoithm. </TD></TR> </TABLE></CAPTION> </DIV><P></P> |
| <P> |
| |
| |
| <P> |
| |
| <H3>Minimum Degree Ordering Algorithm</H3> |
| |
| <P> |
| The pattern of another category of high-quality ordering algorithms in |
| wide use is based on a greedy approach such that the ordering is |
| chosen to minimize some quantity at each step of a simulated -step |
| symmetric Gaussian elimination process. The algorithms using such an |
| approach are typically distinguished by their greedy minimization |
| criteria [<a href="bibliography.html#ng-raghavan">34</a>]. |
| |
| <P> |
| In graph terms, the basic ordering process used by most greedy |
| algorithms is as follows: |
| |
| <OL> |
| <LI><EM>Start:</EM> Construct undirected graph <i>G<sup>0</sup></i> |
| corresponding to matrix <i>A</i> |
| |
| </LI> |
| <LI><EM>Iterate:</EM> For <i>k = 1, 2, ... </i> until |
| <i>G<sup>k</sup> = { }</i> do: |
| |
| <UL> |
| <LI>Choose a vertex <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> |
| according to some criterion |
| </LI> |
| <LI>Eliminate <i>v<sup>k</sup></i> from <i>G<sup>k</sup></i> to form |
| <i>G<sup>k+1</sup></i> |
| |
| </LI> |
| </UL> |
| </LI> |
| </OL> |
| The resulting ordering is the sequence of vertices <i>{v<sup>0</sup>, |
| v<sup>1</sup>,...}</i> selected by the algorithm. |
| |
| <P> |
| One of the most important examples of such an algorithm is the |
| <I>Minimum Degree</I> algorithm. At each step the minimum degree |
| algorithm chooses the vertex with minimum degree in the corresponding |
| graph as <i>v<sup>k</sup></i>. A number of enhancements to the basic |
| minimum degree algorithm have been developed, such as the use of a |
| quotient graph representation, mass elimination, incomplete degree |
| update, multiple elimination, and external degree. See [<A |
| href="bibliography.html#George:evolution">35</a>] for a historical |
| survey of the minimum degree algorithm. |
| |
| <P> |
| The BGL implementation of the Minimum Degree algorithm closely follows |
| the algorithmic descriptions of the one in [<A |
| HREF="bibliography.html#LIU:MMD">21</A>]. The implementation presently |
| includes the enhancements for mass elimination, incomplete degree |
| update, multiple elimination, and external degree. |
| |
| <P> |
| In particular, we create a graph representation to improve the |
| performance of the algorithm. It is based on a templated ``vector of |
| vectors.'' The vector container used is an adaptor class built on top |
| the STL <TT>std::vector</TT> class. Particular characteristics of this |
| adaptor class include the following: |
| |
| <UL> |
| <LI>Erasing elements does not shrink the associated memory. |
| Adding new elements after erasing will not need to allocate |
| additional memory. |
| </LI> |
| <LI>Additional memory is allocated efficiently on demand when new |
| elements are added (doubling the capacity every time it is |
| increased). This property comes from STL vector. |
| </LI> |
| </UL> |
| |
| <P> |
| Note that this representation is similar to that used in Liu's |
| implementation, with some important differences due to dynamic memory |
| allocation. With the dynamic memory allocation we do not need to |
| over-write portions of the graph that have been eliminated, |
| allowing for a more efficient graph traversal. More importantly, |
| information about the elimination graph is preserved allowing for |
| trivial symbolic factorization. Since symbolic factorization can be |
| an expensive part of the entire solution process, improving its |
| performance can result in significant computational savings. |
| |
| <P> |
| The overhead of dynamic memory allocation could conceivably compromise |
| performance in some cases. However, in practice, memory allocation |
| overhead does not contribute significantly to run-time for our |
| implementation as shown in [] because it is not |
| done very often and the cost gets amortized. |
| |
| <P> |
| |
| <H2>Proper Self-Avoiding Walk</H2> |
| |
| <P> |
| The finite element method (FEM) is a flexible and attractive numerical |
| approach for solving partial differential equations since it is |
| straightforward to handle geometrically complicated domains. However, |
| unstructured meshes generated by FEM does not provide an obvious |
| labeling (numbering) of the unknowns while it is vital to have it for |
| matrix-vector notation of the underlying algebraic equations. Special |
| numbering techniques have been developed to optimize memory usage and |
| locality of such algorithms. One novel technique is the self-avoiding |
| walk []. |
| |
| <P> |
| If we think a mesh is a graph, a self-avoiding walk(SAW) over an |
| arbitrary unstructured two-dimensional mesh is an enumeration of all |
| the triangles of the mesh such that two successive triangles shares an |
| edge or a vertex. A proper SAW is a SAW where jumping twice over the |
| same vertex is forbidden for three consecutive triangles in the walk. |
| it can be used to improve parallel efficiency of several irregular |
| algorithms, in particular issues related to reducing runtime memory |
| access (improving locality) and interprocess communications. The |
| reference [] has proved the existence |
| of A PSAW for any arbitrary triangular mesh by extending an existing |
| PSAW over a small piece of mesh to a larger part. The proof |
| effectively provides a set of rules to construct a PSAW. |
| |
| <P> |
| The algorithm family of constructing a PSAW on a mesh is to start from |
| any random triangle of the mesh, choose new triangles sharing an edge |
| with the current sub-mesh and extend the existing partial PSAW over |
| the new triangles. |
| |
| <P> |
| Let us define a dual graph of a mesh. Let a triangle in the mesh be a |
| vertex and two triangles sharing an edge in the mesh means there is an |
| edge between two vertices in the dual graph. By using a dual graph of |
| a mesh, one way to implement the algorithm family of constructing a |
| PSAW is to reuse BGL algorithms such as BFS and DFS with a customized |
| visitor to provide operations during |
| traversal. The customized visitor has the function <TT>tree_edge()</TT> |
| which is to extend partial PSAW in case by case and function |
| <TT>start_vertex()</TT> which is to set up the PSAW for the starting vertex. |
| |
| <br> |
| <HR> |
| <TABLE> |
| <TR valign=top> |
| <TD nowrap>Copyright © 2000-2001</TD><TD> |
| <A HREF="http://www.boost.org/people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>) |
| </TD></TR></TABLE> |
| |
| </BODY> |
| </HTML> |