Go to the first, previous, next, last section, table of contents.

Searching and Sorting

This chapter describes functions for searching and sorting arrays of arbitrary objects. You pass the appropriate comparison function to be applied as an argument, along with the size of the objects in the array and the total number of elements.

Defining the Comparison Function

In order to use the sorted array library functions, you have to describe how to compare the elements of the array.

To do this, you supply a comparison function to compare two elements of the array. The library will call this function, passing as arguments pointers to two array elements to be compared. Your comparison function should return a value the way strcmp (see section String/Array Comparison) does: negative if the first argument is "less" than the second, zero if they are "equal", and positive if the first argument is "greater".

Here is an example of a comparison function which works with an array of numbers of type double:

int
compare_doubles (const void *a, const void *b)
{
  const double *da = (const double *) a;
  const double *db = (const double *) b;

  return (*da > *db) - (*da < *db);
}

The header file `stdlib.h' defines a name for the data type of comparison functions. This type is a GNU extension.

int comparison_fn_t (const void *, const void *);

Array Search Function

Generally searching for a specific element in an array means that potentially all elements must be checked. The GNU C library contains functions to perform linear search. The prototypes for the following two functions can be found in `search.h'.

Function: void * lfind (const void *key, void *base, size_t *nmemb, size_t size, comparison_fn_t compar)

The lfind function searches in the array with *nmemb elements of size bytes pointed to by base for an element which matches the one pointed to by key. The function pointed to by compar is used decide whether two elements match.

The return value is a pointer to the matching element in the array starting at base if it is found. If no matching element is available NULL is returned.

The mean runtime of this function is *nmemb/2. This function should only be used elements often get added to or deleted from the array in which case it might not be useful to sort the array before searching.

Function: void * lsearch (const void *key, void *base, size_t *nmemb, size_t size, comparison_fn_t compar)

The lsearch function is similar to the lfind function. It searches the given array for an element and returns it if found. The difference is that if no matching element is found the lsearch function adds the object pointed to by key (with a size of size bytes) at the end of the array and it increments the value of *nmemb to reflect this addition.

This means for the caller that if it is not sure that the array contains the element one is searching for the memory allocated for the array starting at base must have room for at least size more bytes. If one is sure the element is in the array it is better to use lfind so having more room in the array is always necessary when calling lsearch.

To search a sorted array for an element matching the key, use the bsearch function. The prototype for this function is in the header file `stdlib.h'.

Function: void * bsearch (const void *key, const void *array, size_t count, size_t size, comparison_fn_t compare)

The bsearch function searches the sorted array array for an object that is equivalent to key. The array contains count elements, each of which is of size size bytes.

The compare function is used to perform the comparison. This function is called with two pointer arguments and should return an integer less than, equal to, or greater than zero corresponding to whether its first argument is considered less than, equal to, or greater than its second argument. The elements of the array must already be sorted in ascending order according to this comparison function.

The return value is a pointer to the matching array element, or a null pointer if no match is found. If the array contains more than one element that matches, the one that is returned is unspecified.

This function derives its name from the fact that it is implemented using the binary search algorithm.

Array Sort Function

To sort an array using an arbitrary comparison function, use the qsort function. The prototype for this function is in `stdlib.h'.

Function: void qsort (void *array, size_t count, size_t size, comparison_fn_t compare)

The qsort function sorts the array array. The array contains count elements, each of which is of size size.

The compare function is used to perform the comparison on the array elements. This function is called with two pointer arguments and should return an integer less than, equal to, or greater than zero corresponding to whether its first argument is considered less than, equal to, or greater than its second argument.

Warning: If two objects compare as equal, their order after sorting is unpredictable. That is to say, the sorting is not stable. This can make a difference when the comparison considers only part of the elements. Two elements with the same sort key may differ in other respects.

If you want the effect of a stable sort, you can get this result by writing the comparison function so that, lacking other reason distinguish between two elements, it compares them by their addresses. Note that doing this may make the sorting algorithm less efficient, so do it only if necessary.

Here is a simple example of sorting an array of doubles in numerical order, using the comparison function defined above (see section Defining the Comparison Function):

{
  double *array;
  int size;
  ...
  qsort (array, size, sizeof (double), compare_doubles);
}

The qsort function derives its name from the fact that it was originally implemented using the "quick sort" algorithm.

The implementation of qsort in this library might not be an in-place sort and might thereby use an extra amount of memory to store the array.

Searching and Sorting Example

Here is an example showing the use of qsort and bsearch with an array of structures. The objects in the array are sorted by comparing their name fields with the strcmp function. Then, we can look up individual objects based on their names.

#include <stdlib.h>
#include <stdio.h>
#include <string.h>

/* Define an array of critters to sort. */

struct critter
  {
    const char *name;
    const char *species;
  };

struct critter muppets[] =
  {
    {"Kermit", "frog"},
    {"Piggy", "pig"},
    {"Gonzo", "whatever"},
    {"Fozzie", "bear"},
    {"Sam", "eagle"},
    {"Robin", "frog"},
    {"Animal", "animal"},
    {"Camilla", "chicken"},
    {"Sweetums", "monster"},
    {"Dr. Strangepork", "pig"},
    {"Link Hogthrob", "pig"},
    {"Zoot", "human"},
    {"Dr. Bunsen Honeydew", "human"},
    {"Beaker", "human"},
    {"Swedish Chef", "human"}
  };

int count = sizeof (muppets) / sizeof (struct critter);

/* This is the comparison function used for sorting and searching. */

int 
critter_cmp (const struct critter *c1, const struct critter *c2)
{
  return strcmp (c1->name, c2->name);
}

/* Print information about a critter. */

void 
print_critter (const struct critter *c)
{
  printf ("%s, the %s\n", c->name, c->species);
}

/* Do the lookup into the sorted array. */

void 
find_critter (const char *name)
{
  struct critter target, *result;
  target.name = name;
  result = bsearch (&target, muppets, count, sizeof (struct critter),
                    critter_cmp);
  if (result)
    print_critter (result);
  else
    printf ("Couldn't find %s.\n", name);
}

/* Main program. */

int
main (void)
{
  int i;

  for (i = 0; i < count; i++)
    print_critter (&muppets[i]);
  printf ("\n");

  qsort (muppets, count, sizeof (struct critter), critter_cmp);

  for (i = 0; i < count; i++)
    print_critter (&muppets[i]);
  printf ("\n");

  find_critter ("Kermit");
  find_critter ("Gonzo");
  find_critter ("Janice");

  return 0;
}

The output from this program looks like:

Kermit, the frog
Piggy, the pig
Gonzo, the whatever
Fozzie, the bear
Sam, the eagle
Robin, the frog
Animal, the animal
Camilla, the chicken
Sweetums, the monster
Dr. Strangepork, the pig
Link Hogthrob, the pig
Zoot, the human
Dr. Bunsen Honeydew, the human
Beaker, the human
Swedish Chef, the human

Animal, the animal
Beaker, the human
Camilla, the chicken
Dr. Bunsen Honeydew, the human
Dr. Strangepork, the pig
Fozzie, the bear
Gonzo, the whatever
Kermit, the frog
Link Hogthrob, the pig
Piggy, the pig
Robin, the frog
Sam, the eagle
Swedish Chef, the human
Sweetums, the monster
Zoot, the human

Kermit, the frog
Gonzo, the whatever
Couldn't find Janice.

The `hsearch` function.

The functions mentioned so far in this chapter are searching in a sorted or unsorted array. There are other methods to organize information which later should be searched. The costs of insert, delete and search differ. One possible implementation is using hashing tables.

Function: int hcreate (size_t nel)

The hcreate function creates a hashing table which can contain at least nel elements. There is no possibility to grow this table so it is necessary to choose the value for nel wisely. The used methods to implement this function might make it necessary to make the number of elements in the hashing table larger than the expected maximal number of elements. Hashing tables usually work inefficient if they are filled 80% or more. The constant access time guaranteed by hashing can only be achieved if few collisions exist. See Knuth's "The Art of Computer Programming, Part 3: Searching and Sorting" for more information.

The weakest aspect of this function is that there can be at most one hashing table used through the whole program. The table is allocated in local memory out of control of the programmer. As an extension the GNU C library provides an additional set of functions with an reentrant interface which provide a similar interface but which allow to keep arbitrarily many hashing tables.

It is possible to use more than one hashing table in the program run if the former table is first destroyed by a call to hdestroy.

The function returns a non-zero value if successful. If it return zero something went wrong. This could either mean there is already a hashing table in use or the program runs out of memory.

Function: void hdestroy (void)

The hdestroy function can be used to free all the resources allocated in a previous call of hcreate. After a call to this function it is again possible to call hcreate and allocate a new table with possibly different size.

It is important to remember that the elements contained in the hashing table at the time hdestroy is called are not freed by this function. It is the responsibility of the program code to free those strings (if necessary at all). Freeing all the element memory is not possible without extra, separately kept information since there is no function to iterate through all available elements in the hashing table. If it is really necessary to free a table and all elements the programmer has to keep a list of all table elements and before calling hdestroy s/he has to free all element's data using this list. This is a very unpleasant mechanism and it also shows that this kind of hashing tables is mainly meant for tables which are created once and used until the end of the program run.

Entries of the hashing table and keys for the search are defined using this type:

Data type: struct ENTRY

Both elements of this structure are pointers to zero-terminated strings. This is a limiting restriction of the functionality of the hsearch functions. They can only be used for data sets which use the NUL character always and solely to terminate the records. It is not possible to handle general binary data.

char *key: Pointer to a zero-terminated string of characters describing the key for the search or the element in the hashing table.
char *data: Pointer to a zero-terminated string of characters describing the data. If the functions will be called only for searching an existing entry this element might stay undefined since it is not used.

Function: ENTRY * hsearch (ENTRY item, ACTION action)

To search in a hashing table created using hcreate the hsearch function must be used. This function can perform simple search for an element (if action has the FIND) or it can alternatively insert the key element into the hashing table, possibly replacing a previous value (if action is ENTER).

The key is denoted by a pointer to an object of type ENTRY. For locating the corresponding position in the hashing table only the key element of the structure is used.

The return value depends on the action parameter value. If it is FIND the value is a pointer to the matching element in the hashing table or NULL if no matching element exists. If action is ENTER the return value is only NULL if the programs runs out of memory while adding the new element to the table. Otherwise the return value is a pointer to the element in the hashing table which contains newly added element based on the data in key.

As mentioned before the hashing table used by the functions described so far is global and there can be at any time at most one hashing table in the program. A solution is to use the following functions which are a GNU extension. All have in common that they operate on a hashing table which is described by the content of an object of the type struct hsearch_data. This type should be treated as opaque, none of its members should be changed directly.

Function: int hcreate_r (size_t nel, struct hsearch_data *htab)

The hcreate_r function initializes the object pointed to by htab to contain a hashing table with at least nel elements. So this function is equivalent to the hcreate function except that the initialized data structure is controlled by the user.

This allows having more than one hashing table at one time. The memory necessary for the struct hsearch_data object can be allocated dynamically.

The return value is non-zero if the operation were successful. if the return value is zero something went wrong which probably means the programs runs out of memory.

Function: void hdestroy_r (struct hsearch_data *htab): The hdestroy_r function frees all resources allocated by the hcreate_r function for this very same object htab. As for hdestroy it is the programs responsibility to free the strings for the elements of the table.

Function: int hsearch_r (ENTRY item, ACTION action, ENTRY **retval, struct hsearch_data *htab)

The hsearch_r function is equivalent to hsearch. The meaning of the first two arguments is identical. But instead of operating on a single global hashing table the function works on the table described by the object pointed to by htab (which is initialized by a call to hcreate_r).

Another difference to hcreate is that the pointer to the found entry in the table is not the return value of the functions. It is returned by storing it in a pointer variables pointed to by the retval parameter. The return value of the function is an integer value indicating success if it is non-zero and failure if it is zero. In the latter case the global variable errno signals the reason for the failure.

ENOMEM: The table is filled and hsearch_r was called with an so far unknown key and action set to ENTER.
ESRCH: The action parameter is FIND and no corresponding element is found in the table.

The `tsearch` function.

Another common form to organize data for efficient search is to use trees. The tsearch function family provides a nice interface to functions to organize possibly large amounts of data by providing a mean access time proportional to the logarithm of the number of elements. The GNU C library implementation even guarantees that this bound is never exceeded even for input data which cause problems for simple binary tree implementations.

The functions described in the chapter are all described in the System V and X/Open specifications and are therefore quite portable.

In contrast to the hsearch functions the tsearch functions can be used with arbitrary data and not only zero-terminated strings.

The tsearch functions have the advantage that no function to initialize data structures is necessary. A simple pointer of type void * initialized to NULL is a valid tree and can be extended or searched.

Function: void * tsearch (const void *key, void **rootp, comparison_fn_t compar)

The tsearch function searches in the tree pointed to by *rootp for an element matching key. The function pointed to by compar is used to determine whether two elements match. See section Defining the Comparison Function, for a specification of the functions which can be used for the compar parameter.

If the tree does not contain a matching entry the key value will be added to the tree. tsearch does not make a copy of the object pointed to by key (how could it since the size is unknown). Instead it adds a reference to this object which means the object must be available as long as the tree data structure is used.

The tree is represented by a pointer to a pointer since it is sometimes necessary to change the root node of the tree. So it must not be assumed that the variable pointed to by rootp has the same value after the call. This also shows that it is not safe to call the tsearch function more than once at the same time using the same tree. It is no problem to run it more than once at a time on different trees.

The return value is a pointer to the matching element in the tree. If a new element was created the pointer points to the new data (which is in fact key). If an entry had to be created and the program ran out of space NULL is returned.

Function: void * tfind (const void *key, void *const *rootp, comparison_fn_t compar): The tfind function is similar to the tsearch function. It locates an element matching the one pointed to by key and returns a pointer to this element. But if no matching element is available no new element is entered (note that the rootp parameter points to a constant pointer). Instead the function returns NULL.

Another advantage of the tsearch function in contrast to the hsearch functions is that there is an easy way to remove elements.

Function: void * tdelete (const void *key, void **rootp, comparison_fn_t compar): To remove a specific element matching key from the tree tdelete can be used. It locates the matching element using the same method as tfind. The corresponding element is then removed and a pointer to the parent of the deleted node is returned by the function. If there is no matching entry in the tree nothing can be deleted and the function returns NULL. If the root of the tree is deleted tdelete returns some unspecified value not equal to NULL.

Function: void tdestroy (void *vroot, __free_fn_t freefct)

If the complete search tree has to be removed one can use tdestroy. It frees all resources allocated by the tsearch function to generate the tree pointed to by vroot.

For the data in each tree node the function freefct is called. The pointer to the data is passed as the argument to the function. If no such work is necessary freefct must point to a function doing nothing. It is called in any case.

This function is a GNU extension and not covered by the System V or X/Open specifications.

In addition to the function to create and destroy the tree data structure, there is another function which allows you to apply a function to all elements of the tree. The function must have this type:

void __action_fn_t (const void *nodep, VISIT value, int level);

The nodep is the data value of the current node (once given as the key argument to tsearch). level is a numeric value which corresponds to the depth of the current node in the tree. The root node has the depth @math{0} and its children have a depth of @math{1} and so on. The VISIT type is an enumeration type.

Data Type: VISIT

The VISIT value indicates the status of the current node in the tree and how the function is called. The status of a node is either `leaf' or `internal node'. For each leaf node the function is called exactly once, for each internal node it is called three times: before the first child is processed, after the first child is processed and after both children are processed. This makes it possible to handle all three methods of tree traversal (or even a combination of them).

preorder: The current node is an internal node and the function is called before the first child was processed.
postorder: The current node is an internal node and the function is called after the first child was processed.
endorder: The current node is an internal node and the function is called after the second child was processed.
leaf: The current node is a leaf.

Function: void twalk (const void *root, __action_fn_t action)

For each node in the tree with a node pointed to by root, the twalk function calls the function provided by the parameter action. For leaf nodes the function is called exactly once with value set to leaf. For internal nodes the function is called three times, setting the value parameter or action to the appropriate value. The level argument for the action function is computed while descending the tree with increasing the value by one for the descend to a child, starting with the value @math{0} for the root node.

Since the functions used for the action parameter to twalk must not modify the tree data, it is safe to run twalk in more than one thread at the same time, working on the same tree. It is also safe to call tfind in parallel. Functions which modify the tree must not be used, otherwise the behaviour is undefined.