C Language

String Symphony

C doesn't have built-in support for variable-length strings; instead, C "fakes" strings by using static character arrays, character pointers, and a library of string functions that take pointers as arguments. Because strings inherently can be any length, jamming strings into C's fixed-length arrays leads to especially perverse pitfalls. Take a simple assignment of one string variable to another:

 strcpy(b, a);

This innocent statement has probably reduced the average life expectancy of C programmers by five years - stress is not good for a programmer's health. What's wrong with this statement? I don't know ... maybe nothing. Maybe when this statement executes, the string in a will be no longer than b can hold. If so, everything is all right. If not, everything is all wrong. The strcpy function is a primitive memory-to-memory copy that is not limited by the target's declared size. In this example, if the string in a is longer than the size of b, whatever is next to b in memory will be trashed. On PCs, this might even be operating system code, leaving your system frozen solid.

The typical way in C to avoid string operations that overwrite memory is to "be careful." That works for experienced programmers - most of the time. It's not a pretty sight, however, when this strategy doesn't work. The only effective strategy is: Always guard a string assignment against overwriting the target variable.

Figure 4.5 shows one way to guard a string copy, using the sizeof operator. If the source string (including the '\0' terminator) fits in the target, the whole string is copied; otherwise, only as much as will fit is copied, and a null terminator is added. Even though this technique may truncate some strings, your program will continue its proper execution flow, rather than take some wild path caused by overwriting part of the program's instructions.

Figure 4.5 Guarding a String Copy

 if ( strlen( source ) < sizeof( target ) ) {
     strcpy( target, source );
 }
 else {
   strncpy( target, source, ( sizeof( target ) - 1 ) );
   target[ ( sizeof( target ) - 1 ) ] = '\0';
 }

Of course, this technique is a prime candidate for a macro, using source and target as parameters. You can use similar macros for the other C library string assignment functions, such as strcat. You can also add warning messages to your macros to make error diagnosis even easier.

Unfortunately, macros using the sizeof operator won't work for target strings that are function parameters, because the size of a string parameter is not automatically passed to a function (C passes just a pointer to the first character in the string). To handle string parameters whose value you want to change (i.e., output or update parameters), you must explicitly pass the string's declared size (or its maximum length, which is one less than the string's declared size) to the function. Figure 4.6a shows simple macros to implement "safe" strings. The STRING macro declares a string and an associated variable to hold the string's maximum length. STRING_TABLE declares a table (base-1 array) of strings. The cpystr macro simplifies a call to the strcpymax function shown in Figure 4.6b.

Figure 4.6a Macros for Safe Strings

 #define STRING( sname, smaxlen )          \
      size_t sname##_maxlen = ( smaxlen ); \
      char  sname[ ( smaxlen ) + 1 ]
 #define STRING_TABLE( tname, ttop, smaxlen )      \
      int  tname##_upper_bound = ( ttop );   \
      size_t tname##_maxlen   = ( smaxlen );  \
      char   tname [ ( ttop ) + 1 ] [ ( smaxlen ) + 1 ]
 #define strmaxlen( sname ) sname##_maxlen
 #define cpystr(  target, source )         \
      strcpymax( target, source, target##_maxlen )

Figure 4.6b Safe String Copy Function

 char * strcpymax(    char  target[],
            const char  source[],
            const size_t target_maxlen ) {
    if ( strlen( source ) <= target_maxlen ) {
      strcpy( target, source );
    }
    else {
      strncpy( target, source, target_maxlen );
      target[ target_maxlen ] = '\0';
    }
    return target;
 }

Figure 4.6c shows how to use these macros in your C programs. Note that the first element in the month_abv table's initialization list is just a placeholder for the unused element. Also note that when an element of a string table is the target of a string copy, you must use the strcpymax function rather than the cpystr macro because the cpystr macro can't generate the correct name for the string maximum length variable. These macros and this function work with strings passed as parameters, as well as with those declared as local variables. There are still some limitations with this technique, however. More complex aggregate data types (e.g., structures containing strings) require additional macros for declarations, or other techniques.

Figure 4.6c Using the Safe String Macros

 int month;
 STRING( print_line, 80 );
 STRING_TABLE( month_abv, 12, 3 ) =
  { "", "Jan", "Feb", "Mar", "Apr", "May", "Jun",
      "Jul", "Aug", "Sep", "Oct", "Nov", "Dec" };
 .
 .
 .
 cpystr( print_line, month_abv[ month ] );
 .
 .
 .
 strcpymax( month_abv[ 9 ], "Spt", strmaxlen( month_abv ) );
 .
 .
 .
 OVER_TABLE( month_abv, month )
   printf( "%s is month number %d\n",
         month_abv[ month ], month );
 ENDOVER

A full description of implementing safe, variable-length strings in C is beyond the scope of this tutorial, but you can do it using structures that contain string lengths and pointers to one or more memory blocks for the string contents. I've seen numerous programs where the programmer has built such structures "on the fly." Such programs are often fragile and flaky because they combine some of C's most treacherous features: pointers and dynamic memory management. A more structured approach can reduce your risks.

If you embark on an advanced string implementation, be sure to build a library of macros and functions that provide a safe, high-level set of string operations, and use these instead of C's primitive string functions. Your best bet is probably to use C++ and an existing C++ string class (e.g., ones available from The Free Software Institute) or switch to Awk, a language that has a C-like syntax and includes full support for variable-length strings.

Rough C Coming

Most of the C pitfalls I've covered in the first three chapters are fairly easily circumvented by avoiding certain language constructs and using macros. Strings presented the first example of inherent C constructs for which there is no simple, universal solution (other than perhaps moving to C++). In Chapter 5, I'll take up pointers, a feature of C even more difficult than strings to handle safely.

C Coding Suggestions

  • Declare C arrays with one extra element, and don't use the element with subscript 0.
  • Use macros to define tables and loops over them.
  • Always guard a string assignment against overwriting the target variable.
  • Create macros and functions to define strings and provide "safe" string operations.
by BrainBellupdated
Advertisement: