| @node Searching and Sorting, Pattern Matching, Message Translation, Top |
| @c %MENU% General searching and sorting functions |
| @chapter Searching and Sorting |
| |
| This chapter describes functions for searching and sorting arrays of |
| arbitrary objects. You pass the appropriate comparison function to be |
| applied as an argument, along with the size of the objects in the array |
| and the total number of elements. |
| |
| @menu |
| * Comparison Functions:: Defining how to compare two objects. |
| Since the sort and search facilities |
| are general, you have to specify the |
| ordering. |
| * Array Search Function:: The @code{bsearch} function. |
| * Array Sort Function:: The @code{qsort} function. |
| * Search/Sort Example:: An example program. |
| * Hash Search Function:: The @code{hsearch} function. |
| * Tree Search Function:: The @code{tsearch} function. |
| @end menu |
| |
| @node Comparison Functions |
| @section Defining the Comparison Function |
| @cindex Comparison Function |
| |
| In order to use the sorted array library functions, you have to describe |
| how to compare the elements of the array. |
| |
| To do this, you supply a comparison function to compare two elements of |
| the array. The library will call this function, passing as arguments |
| pointers to two array elements to be compared. Your comparison function |
| should return a value the way @code{strcmp} (@pxref{String/Array |
| Comparison}) does: negative if the first argument is ``less'' than the |
| second, zero if they are ``equal'', and positive if the first argument |
| is ``greater''. |
| |
| Here is an example of a comparison function which works with an array of |
| numbers of type @code{double}: |
| |
| @smallexample |
| int |
| compare_doubles (const void *a, const void *b) |
| @{ |
| const double *da = (const double *) a; |
| const double *db = (const double *) b; |
| |
| return (*da > *db) - (*da < *db); |
| @} |
| @end smallexample |
| |
| The header file @file{stdlib.h} defines a name for the data type of |
| comparison functions. This type is a GNU extension. |
| |
| @comment stdlib.h |
| @comment GNU |
| @tindex comparison_fn_t |
| @smallexample |
| int comparison_fn_t (const void *, const void *); |
| @end smallexample |
| |
| @node Array Search Function |
| @section Array Search Function |
| @cindex search function (for arrays) |
| @cindex binary search function (for arrays) |
| @cindex array search function |
| |
| Generally searching for a specific element in an array means that |
| potentially all elements must be checked. The GNU C library contains |
| functions to perform linear search. The prototypes for the following |
| two functions can be found in @file{search.h}. |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {void *} lfind (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) |
| The @code{lfind} function searches in the array with @code{*@var{nmemb}} |
| elements of @var{size} bytes pointed to by @var{base} for an element |
| which matches the one pointed to by @var{key}. The function pointed to |
| by @var{compar} is used decide whether two elements match. |
| |
| The return value is a pointer to the matching element in the array |
| starting at @var{base} if it is found. If no matching element is |
| available @code{NULL} is returned. |
| |
| The mean runtime of this function is @code{*@var{nmemb}}/2. This |
| function should only be used if elements often get added to or deleted from |
| the array in which case it might not be useful to sort the array before |
| searching. |
| @end deftypefun |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {void *} lsearch (const void *@var{key}, void *@var{base}, size_t *@var{nmemb}, size_t @var{size}, comparison_fn_t @var{compar}) |
| The @code{lsearch} function is similar to the @code{lfind} function. It |
| searches the given array for an element and returns it if found. The |
| difference is that if no matching element is found the @code{lsearch} |
| function adds the object pointed to by @var{key} (with a size of |
| @var{size} bytes) at the end of the array and it increments the value of |
| @code{*@var{nmemb}} to reflect this addition. |
| |
| This means for the caller that if it is not sure that the array contains |
| the element one is searching for the memory allocated for the array |
| starting at @var{base} must have room for at least @var{size} more |
| bytes. If one is sure the element is in the array it is better to use |
| @code{lfind} so having more room in the array is always necessary when |
| calling @code{lsearch}. |
| @end deftypefun |
| |
| To search a sorted array for an element matching the key, use the |
| @code{bsearch} function. The prototype for this function is in |
| the header file @file{stdlib.h}. |
| @pindex stdlib.h |
| |
| @comment stdlib.h |
| @comment ISO |
| @deftypefun {void *} bsearch (const void *@var{key}, const void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) |
| The @code{bsearch} function searches the sorted array @var{array} for an object |
| that is equivalent to @var{key}. The array contains @var{count} elements, |
| each of which is of size @var{size} bytes. |
| |
| The @var{compare} function is used to perform the comparison. This |
| function is called with two pointer arguments and should return an |
| integer less than, equal to, or greater than zero corresponding to |
| whether its first argument is considered less than, equal to, or greater |
| than its second argument. The elements of the @var{array} must already |
| be sorted in ascending order according to this comparison function. |
| |
| The return value is a pointer to the matching array element, or a null |
| pointer if no match is found. If the array contains more than one element |
| that matches, the one that is returned is unspecified. |
| |
| This function derives its name from the fact that it is implemented |
| using the binary search algorithm. |
| @end deftypefun |
| |
| @node Array Sort Function |
| @section Array Sort Function |
| @cindex sort function (for arrays) |
| @cindex quick sort function (for arrays) |
| @cindex array sort function |
| |
| To sort an array using an arbitrary comparison function, use the |
| @code{qsort} function. The prototype for this function is in |
| @file{stdlib.h}. |
| @pindex stdlib.h |
| |
| @comment stdlib.h |
| @comment ISO |
| @deftypefun void qsort (void *@var{array}, size_t @var{count}, size_t @var{size}, comparison_fn_t @var{compare}) |
| The @var{qsort} function sorts the array @var{array}. The array contains |
| @var{count} elements, each of which is of size @var{size}. |
| |
| The @var{compare} function is used to perform the comparison on the |
| array elements. This function is called with two pointer arguments and |
| should return an integer less than, equal to, or greater than zero |
| corresponding to whether its first argument is considered less than, |
| equal to, or greater than its second argument. |
| |
| @cindex stable sorting |
| @strong{Warning:} If two objects compare as equal, their order after |
| sorting is unpredictable. That is to say, the sorting is not stable. |
| This can make a difference when the comparison considers only part of |
| the elements. Two elements with the same sort key may differ in other |
| respects. |
| |
| If you want the effect of a stable sort, you can get this result by |
| writing the comparison function so that, lacking other reason |
| distinguish between two elements, it compares them by their addresses. |
| Note that doing this may make the sorting algorithm less efficient, so |
| do it only if necessary. |
| |
| Here is a simple example of sorting an array of doubles in numerical |
| order, using the comparison function defined above (@pxref{Comparison |
| Functions}): |
| |
| @smallexample |
| @{ |
| double *array; |
| int size; |
| @dots{} |
| qsort (array, size, sizeof (double), compare_doubles); |
| @} |
| @end smallexample |
| |
| The @code{qsort} function derives its name from the fact that it was |
| originally implemented using the ``quick sort'' algorithm. |
| |
| The implementation of @code{qsort} in this library might not be an |
| in-place sort and might thereby use an extra amount of memory to store |
| the array. |
| @end deftypefun |
| |
| @node Search/Sort Example |
| @section Searching and Sorting Example |
| |
| Here is an example showing the use of @code{qsort} and @code{bsearch} |
| with an array of structures. The objects in the array are sorted |
| by comparing their @code{name} fields with the @code{strcmp} function. |
| Then, we can look up individual objects based on their names. |
| |
| @comment This example is dedicated to the memory of Jim Henson. RIP. |
| @smallexample |
| @include search.c.texi |
| @end smallexample |
| |
| @cindex Kermit the frog |
| The output from this program looks like: |
| |
| @smallexample |
| Kermit, the frog |
| Piggy, the pig |
| Gonzo, the whatever |
| Fozzie, the bear |
| Sam, the eagle |
| Robin, the frog |
| Animal, the animal |
| Camilla, the chicken |
| Sweetums, the monster |
| Dr. Strangepork, the pig |
| Link Hogthrob, the pig |
| Zoot, the human |
| Dr. Bunsen Honeydew, the human |
| Beaker, the human |
| Swedish Chef, the human |
| |
| Animal, the animal |
| Beaker, the human |
| Camilla, the chicken |
| Dr. Bunsen Honeydew, the human |
| Dr. Strangepork, the pig |
| Fozzie, the bear |
| Gonzo, the whatever |
| Kermit, the frog |
| Link Hogthrob, the pig |
| Piggy, the pig |
| Robin, the frog |
| Sam, the eagle |
| Swedish Chef, the human |
| Sweetums, the monster |
| Zoot, the human |
| |
| Kermit, the frog |
| Gonzo, the whatever |
| Couldn't find Janice. |
| @end smallexample |
| |
| |
| @node Hash Search Function |
| @section The @code{hsearch} function. |
| |
| The functions mentioned so far in this chapter are for searching in a sorted |
| or unsorted array. There are other methods to organize information |
| which later should be searched. The costs of insert, delete and search |
| differ. One possible implementation is using hashing tables. |
| The following functions are declared in the header file @file{search.h}. |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun int hcreate (size_t @var{nel}) |
| The @code{hcreate} function creates a hashing table which can contain at |
| least @var{nel} elements. There is no possibility to grow this table so |
| it is necessary to choose the value for @var{nel} wisely. The method |
| used to implement this function might make it necessary to make the |
| number of elements in the hashing table larger than the expected maximal |
| number of elements. Hashing tables usually work inefficiently if they are |
| filled 80% or more. The constant access time guaranteed by hashing can |
| only be achieved if few collisions exist. See Knuth's ``The Art of |
| Computer Programming, Part 3: Searching and Sorting'' for more |
| information. |
| |
| The weakest aspect of this function is that there can be at most one |
| hashing table used through the whole program. The table is allocated |
| in local memory out of control of the programmer. As an extension the |
| GNU C library provides an additional set of functions with an reentrant |
| interface which provide a similar interface but which allow to keep |
| arbitrarily many hashing tables. |
| |
| It is possible to use more than one hashing table in the program run if |
| the former table is first destroyed by a call to @code{hdestroy}. |
| |
| The function returns a non-zero value if successful. If it return zero |
| something went wrong. This could either mean there is already a hashing |
| table in use or the program runs out of memory. |
| @end deftypefun |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun void hdestroy (void) |
| The @code{hdestroy} function can be used to free all the resources |
| allocated in a previous call of @code{hcreate}. After a call to this |
| function it is again possible to call @code{hcreate} and allocate a new |
| table with possibly different size. |
| |
| It is important to remember that the elements contained in the hashing |
| table at the time @code{hdestroy} is called are @emph{not} freed by this |
| function. It is the responsibility of the program code to free those |
| strings (if necessary at all). Freeing all the element memory is not |
| possible without extra, separately kept information since there is no |
| function to iterate through all available elements in the hashing table. |
| If it is really necessary to free a table and all elements the |
| programmer has to keep a list of all table elements and before calling |
| @code{hdestroy} s/he has to free all element's data using this list. |
| This is a very unpleasant mechanism and it also shows that this kind of |
| hashing tables is mainly meant for tables which are created once and |
| used until the end of the program run. |
| @end deftypefun |
| |
| Entries of the hashing table and keys for the search are defined using |
| this type: |
| |
| @deftp {Data type} {struct ENTRY} |
| Both elements of this structure are pointers to zero-terminated strings. |
| This is a limiting restriction of the functionality of the |
| @code{hsearch} functions. They can only be used for data sets which use |
| the NUL character always and solely to terminate the records. It is not |
| possible to handle general binary data. |
| |
| @table @code |
| @item char *key |
| Pointer to a zero-terminated string of characters describing the key for |
| the search or the element in the hashing table. |
| @item char *data |
| Pointer to a zero-terminated string of characters describing the data. |
| If the functions will be called only for searching an existing entry |
| this element might stay undefined since it is not used. |
| @end table |
| @end deftp |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {ENTRY *} hsearch (ENTRY @var{item}, ACTION @var{action}) |
| To search in a hashing table created using @code{hcreate} the |
| @code{hsearch} function must be used. This function can perform simple |
| search for an element (if @var{action} has the @code{FIND}) or it can |
| alternatively insert the key element into the hashing table. Entries |
| are never replaced. |
| |
| The key is denoted by a pointer to an object of type @code{ENTRY}. For |
| locating the corresponding position in the hashing table only the |
| @code{key} element of the structure is used. |
| |
| If an entry with matching key is found the @var{action} parameter is |
| irrelevant. The found entry is returned. If no matching entry is found |
| and the @var{action} parameter has the value @code{FIND} the function |
| returns a @code{NULL} pointer. If no entry is found and the |
| @var{action} parameter has the value @code{ENTER} a new entry is added |
| to the hashing table which is initialized with the parameter @var{item}. |
| A pointer to the newly added entry is returned. |
| @end deftypefun |
| |
| As mentioned before the hashing table used by the functions described so |
| far is global and there can be at any time at most one hashing table in |
| the program. A solution is to use the following functions which are a |
| GNU extension. All have in common that they operate on a hashing table |
| which is described by the content of an object of the type @code{struct |
| hsearch_data}. This type should be treated as opaque, none of its |
| members should be changed directly. |
| |
| @comment search.h |
| @comment GNU |
| @deftypefun int hcreate_r (size_t @var{nel}, struct hsearch_data *@var{htab}) |
| The @code{hcreate_r} function initializes the object pointed to by |
| @var{htab} to contain a hashing table with at least @var{nel} elements. |
| So this function is equivalent to the @code{hcreate} function except |
| that the initialized data structure is controlled by the user. |
| |
| This allows having more than one hashing table at one time. The memory |
| necessary for the @code{struct hsearch_data} object can be allocated |
| dynamically. It must be initialized with zero before calling this |
| function. |
| |
| The return value is non-zero if the operation was successful. If the |
| return value is zero, something went wrong, which probably means the |
| programs ran out of memory. |
| @end deftypefun |
| |
| @comment search.h |
| @comment GNU |
| @deftypefun void hdestroy_r (struct hsearch_data *@var{htab}) |
| The @code{hdestroy_r} function frees all resources allocated by the |
| @code{hcreate_r} function for this very same object @var{htab}. As for |
| @code{hdestroy} it is the programs responsibility to free the strings |
| for the elements of the table. |
| @end deftypefun |
| |
| @comment search.h |
| @comment GNU |
| @deftypefun int hsearch_r (ENTRY @var{item}, ACTION @var{action}, ENTRY **@var{retval}, struct hsearch_data *@var{htab}) |
| The @code{hsearch_r} function is equivalent to @code{hsearch}. The |
| meaning of the first two arguments is identical. But instead of |
| operating on a single global hashing table the function works on the |
| table described by the object pointed to by @var{htab} (which is |
| initialized by a call to @code{hcreate_r}). |
| |
| Another difference to @code{hcreate} is that the pointer to the found |
| entry in the table is not the return value of the functions. It is |
| returned by storing it in a pointer variables pointed to by the |
| @var{retval} parameter. The return value of the function is an integer |
| value indicating success if it is non-zero and failure if it is zero. |
| In the latter case the global variable @var{errno} signals the reason for |
| the failure. |
| |
| @table @code |
| @item ENOMEM |
| The table is filled and @code{hsearch_r} was called with an so far |
| unknown key and @var{action} set to @code{ENTER}. |
| @item ESRCH |
| The @var{action} parameter is @code{FIND} and no corresponding element |
| is found in the table. |
| @end table |
| @end deftypefun |
| |
| |
| @node Tree Search Function |
| @section The @code{tsearch} function. |
| |
| Another common form to organize data for efficient search is to use |
| trees. The @code{tsearch} function family provides a nice interface to |
| functions to organize possibly large amounts of data by providing a mean |
| access time proportional to the logarithm of the number of elements. |
| The GNU C library implementation even guarantees that this bound is |
| never exceeded even for input data which cause problems for simple |
| binary tree implementations. |
| |
| The functions described in the chapter are all described in the @w{System |
| V} and X/Open specifications and are therefore quite portable. |
| |
| In contrast to the @code{hsearch} functions the @code{tsearch} functions |
| can be used with arbitrary data and not only zero-terminated strings. |
| |
| The @code{tsearch} functions have the advantage that no function to |
| initialize data structures is necessary. A simple pointer of type |
| @code{void *} initialized to @code{NULL} is a valid tree and can be |
| extended or searched. The prototypes for these functions can be found |
| in the header file @file{search.h}. |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {void *} tsearch (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) |
| The @code{tsearch} function searches in the tree pointed to by |
| @code{*@var{rootp}} for an element matching @var{key}. The function |
| pointed to by @var{compar} is used to determine whether two elements |
| match. @xref{Comparison Functions}, for a specification of the functions |
| which can be used for the @var{compar} parameter. |
| |
| If the tree does not contain a matching entry the @var{key} value will |
| be added to the tree. @code{tsearch} does not make a copy of the object |
| pointed to by @var{key} (how could it since the size is unknown). |
| Instead it adds a reference to this object which means the object must |
| be available as long as the tree data structure is used. |
| |
| The tree is represented by a pointer to a pointer since it is sometimes |
| necessary to change the root node of the tree. So it must not be |
| assumed that the variable pointed to by @var{rootp} has the same value |
| after the call. This also shows that it is not safe to call the |
| @code{tsearch} function more than once at the same time using the same |
| tree. It is no problem to run it more than once at a time on different |
| trees. |
| |
| The return value is a pointer to the matching element in the tree. If a |
| new element was created the pointer points to the new data (which is in |
| fact @var{key}). If an entry had to be created and the program ran out |
| of space @code{NULL} is returned. |
| @end deftypefun |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {void *} tfind (const void *@var{key}, void *const *@var{rootp}, comparison_fn_t @var{compar}) |
| The @code{tfind} function is similar to the @code{tsearch} function. It |
| locates an element matching the one pointed to by @var{key} and returns |
| a pointer to this element. But if no matching element is available no |
| new element is entered (note that the @var{rootp} parameter points to a |
| constant pointer). Instead the function returns @code{NULL}. |
| @end deftypefun |
| |
| Another advantage of the @code{tsearch} function in contrast to the |
| @code{hsearch} functions is that there is an easy way to remove |
| elements. |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun {void *} tdelete (const void *@var{key}, void **@var{rootp}, comparison_fn_t @var{compar}) |
| To remove a specific element matching @var{key} from the tree |
| @code{tdelete} can be used. It locates the matching element using the |
| same method as @code{tfind}. The corresponding element is then removed |
| and a pointer to the parent of the deleted node is returned by the |
| function. If there is no matching entry in the tree nothing can be |
| deleted and the function returns @code{NULL}. If the root of the tree |
| is deleted @code{tdelete} returns some unspecified value not equal to |
| @code{NULL}. |
| @end deftypefun |
| |
| @comment search.h |
| @comment GNU |
| @deftypefun void tdestroy (void *@var{vroot}, __free_fn_t @var{freefct}) |
| If the complete search tree has to be removed one can use |
| @code{tdestroy}. It frees all resources allocated by the @code{tsearch} |
| function to generate the tree pointed to by @var{vroot}. |
| |
| For the data in each tree node the function @var{freefct} is called. |
| The pointer to the data is passed as the argument to the function. If |
| no such work is necessary @var{freefct} must point to a function doing |
| nothing. It is called in any case. |
| |
| This function is a GNU extension and not covered by the @w{System V} or |
| X/Open specifications. |
| @end deftypefun |
| |
| In addition to the function to create and destroy the tree data |
| structure, there is another function which allows you to apply a |
| function to all elements of the tree. The function must have this type: |
| |
| @smallexample |
| void __action_fn_t (const void *nodep, VISIT value, int level); |
| @end smallexample |
| |
| The @var{nodep} is the data value of the current node (once given as the |
| @var{key} argument to @code{tsearch}). @var{level} is a numeric value |
| which corresponds to the depth of the current node in the tree. The |
| root node has the depth @math{0} and its children have a depth of |
| @math{1} and so on. The @code{VISIT} type is an enumeration type. |
| |
| @deftp {Data Type} VISIT |
| The @code{VISIT} value indicates the status of the current node in the |
| tree and how the function is called. The status of a node is either |
| `leaf' or `internal node'. For each leaf node the function is called |
| exactly once, for each internal node it is called three times: before |
| the first child is processed, after the first child is processed and |
| after both children are processed. This makes it possible to handle all |
| three methods of tree traversal (or even a combination of them). |
| |
| @table @code |
| @item preorder |
| The current node is an internal node and the function is called before |
| the first child was processed. |
| @item postorder |
| The current node is an internal node and the function is called after |
| the first child was processed. |
| @item endorder |
| The current node is an internal node and the function is called after |
| the second child was processed. |
| @item leaf |
| The current node is a leaf. |
| @end table |
| @end deftp |
| |
| @comment search.h |
| @comment SVID |
| @deftypefun void twalk (const void *@var{root}, __action_fn_t @var{action}) |
| For each node in the tree with a node pointed to by @var{root}, the |
| @code{twalk} function calls the function provided by the parameter |
| @var{action}. For leaf nodes the function is called exactly once with |
| @var{value} set to @code{leaf}. For internal nodes the function is |
| called three times, setting the @var{value} parameter or @var{action} to |
| the appropriate value. The @var{level} argument for the @var{action} |
| function is computed while descending the tree with increasing the value |
| by one for the descend to a child, starting with the value @math{0} for |
| the root node. |
| |
| Since the functions used for the @var{action} parameter to @code{twalk} |
| must not modify the tree data, it is safe to run @code{twalk} in more |
| than one thread at the same time, working on the same tree. It is also |
| safe to call @code{tfind} in parallel. Functions which modify the tree |
| must not be used, otherwise the behavior is undefined. |
| @end deftypefun |