[Contents]

Macros and Miscellaneous Pitfalls

A bad macro can drive a good programmer mad. Imagine the frustration when an unsuspecting programmer codes:

 x = 3;
 y = cube( x + 1 );
 z = 5 * double( x );

thinking that cube( x + 1 ) will produce the value 64 (43), only to find that y is set to 10; and thinking that 5 * double( x ) will produce the value 30, only to find that z is set to 18. The mystery becomes clear when the programmer examines the cube and double macro definitions and finds:

 #define cube( x ) x*x*x
 #define double( x ) x+x

Given these definitions, the compiler expands cube( x + 1) as:

 x + 1*x + 1*x + 1

which, because in C the * operator binds more tightly than the + operator, is equivalent to:

 x + ( 1 * x ) + ( 1 * x ) + 1

When x is 3, the value of this expression is:

 3 + ( 1 * 3 ) + ( 1 * 3 ) + 1

or 10.

Similarly, the compiler expands 5 * double( x ) as:

 5 * x+x

which is equivalent to:

 ( 5 * x ) + x

When x is 3, this expression evaluates to 18.

Macros aren't "magic"-the compiler simply replaces a macro reference with expanded text according to the macro's definition. This simple text expansion requires care that the context of a macro expansion doesn't cause the resulting expression to have an unexpected meaning. You can avoid many problems with macros by following a simple rule: Put parentheses (or other explicit delimiters) around the macro text and around each macro argument within the macro text.

Following this rule, the cube and double macros can be defined as:

 #define cube( x ) ( ( x ) * ( x ) * ( x ) )
 #define double( x ) ( ( x ) + ( x ) )

The two assignment statements above will then expand to:

 y = ( ( x + 1 ) * ( x + 1 ) * ( x + 1 ) )
 z = 5 * ( ( x ) + ( x ) )

which will do what the programmer originally expected.

You may remember that some macros I presented in earlier chapters don't have so many parentheses. For example, in Chapter 4 I defined the cpystr macro as:

 #define cpystr(  target, source )\    strcpymax( target, source, target##_maxlen)

In this case, the expansion text is always delimited by the strcpymax function name and closing parenthesis, so there's no need for parentheses around the entire text. The target and source arguments are delimited by the commas that separate function arguments. It still wouldn't hurt to add additional parentheses as a matter of good macro programming habits, however.

Even with the protection of parentheses, a simple macro, such as double, can cause unexpected results. The second statement below is intended to increment x (to 4) and put double the new value (8) in y.

 x = 3;
 y = double( ++x );

What it actually does is increase x to 5 and set y to 9. This results from the expanded code:

 y = ( ( ++x ) + ( ++x ) );

which evaluates the macro argument twice. In this case, the argument ++x has the side effect of incrementing x, and the expanded macro does this twice instead of once, as intended. You can avoid such problems by following another rule: Never pass an expression that has side effects as a macro argument.

This example also provides additional evidence that C's ++ and - operators, which seem so simple and "innocent," are often the culprits in causing unintended side effects. You may recall that in Chapter 6 I showed how ++ and - can cause problems in assignment statements. The unary increment and decrement operators themselves are not really to blame; rather, it's the common C programming practice of embedding an increment or decrement operation within a larger expression. C programmers frequently code

 next = ary[ ++i ];

instead of

 ++i;
 next = ary[ i ];

Within simple array subscripts, using ++ or - is a safe and generally comprehensible technique. You must be careful, however, to use the correct pre- or post-increment alternative. In contrast, with separate statements to increment the index and reference the array, you can always use pre-increment (e.g., ++i) because the statement order makes clear whether you are incrementing the index before or after referencing the array. In general, I recommend the use of separate statements for incrementing and decrementing array indexes because the code layout more strongly expresses the sequence of operations. This is not typical C style, but then much of what's considered "standard" C style stems more from habit and fashion than good programming practices.

Most problems I've seen in C programs stem from many C programmers' attitude that a simple ++i statement by itself is somehow "wasteful" (of what, I'm not sure), and a way must be found to embed all increment and decrement operations into adjacent statements. It's a pity for those C programmers who don't follow the general guideline: Place simple increment and decrement operations in separate statements, because this guideline frees you from concerns about when ++, and - side effects can cause trouble and lets you use these otherwise nice syntactic elements of C. (For systems programming, a careful embedding of ++ or - may provide better performance in some cases. But in business programming, any potential advantages of such techniques are inconsequential and should not influence the way you use the ++ and - operators.)

Once is Enough

When you create your own macros, you should try to avoid evaluating a macro argument more than once, if possible. This practice reduces the problem of unintended side effects. For example, an obvious improvement to the double macro definition is:

 #define double( x ) ( 2 * ( x ) )

Not all macros can be defined to avoid multiple references to their arguments (consider the problem with a max( x, y ) macro). If you want to avoid any chance of problems caused by multiple evaluation of arguments, use a function rather than a macro.

Macros can contain almost any kind of source, including complete statements. When defining a macro, be sure to consider all the contexts in which the macro may be used. One difficult area is when a macro includes conditional logic. Suppose you have a macro to print messages only when a "trace" variable is on:

 #define ptrace( sts, str ) \
        if ( sts ) printf( "%s\n", str )

A reference to ptrace might be:

 if ( x < 0 ) ptrace( traceon, "Negative input" );
 else         ptrace( traceon, "OK input" );

which, when expanded (and indented to show the logical structure) is:

 if ( x < 0 )
    if ( traceon ) printf( "%s\n", "Negative input" );
    else
      if ( traceon ) printf( "%s\n", "OK input" );

This code will not print a message when x is non-negative, regardless of the setting of traceon. This unintended result stems from the "dangling else" pitfall I described in Chapter 3. You can avoid the problem by always using braces for conditional statements, as I recommended. The following statements evaluate properly:

 if ( x < 0 ) {
    ptrace( traceon, "Negative input" );
 }
 else {
    ptrace( traceon, "OK input" );
 }

But when you're creating macros, you shouldn't assume that the person using the macro will follow similar guidelines. Correcting this problem isn't a simple matter of adding braces to the macro definition because you would then have to not place a semicolon after ptrace(…) when you used the macro - an unacceptable exception to normal C syntax. Instead, drawing on a suggestion by Andrew Koenig, you can restructure the macro as an expression instead of a statement:

 #define ptrace( sts, str ) \
 ( (void) ( ( ! ( sts ) ) || printf( "%s\n", str ) ) )

The "trick" to this macro is the C standard that logical expressions are always evaluated using left-to-right, "short-circuit" evaluation. Thus, ( ! ( sts ) ) is evaluated first, and if sts is zero (false), the whole logical expression is true, and the second part (the printf) is never evaluated. If sts is non-zero (true), the printf is invoked as part of the expression evaluation. The (void) provides a generic type cast so ptrace can be used in expressions. When things get this complicated, however, it's probably a good time to switch to a function or use C's conditional compilation (#if…#endif) facilities.

Although you can encounter some "gotcha's" using macros, properly used they offer an essential means of insulating yourself from many of C's other danger zones. Don't hesitate to use macros, but don't use them as a "lazy person's" alternative to typedef's, enumerations and functions, when one of these alternatives provides a better solution. Also, take care when you define macros not to set traps for the unwary programmer (who may be yourself) that uses your macros.

The "Impossible" Dream

Sometimes, it takes real character to program in C. For instance, suppose you compiled and ran the following code:

unsigned char c;

 c = '\xff';
 if ( c != '\xff' ) print( "Impossible!\n" );

would it seem impossible to print "Impossible!"? Not with some C compilers. The C standard lets compiler writers decide whether the default char type means signed char or unsigned char. The default sign of the char type affects how char values are converted in mixed-type expressions. If the default is signed, the compiler will convert the character constant '\xff' to a signed integer by extending the high-order bit. (Oddly enough, C defines character constants as int type.) Thus, '\xff' would have a 16-bit integer value of 0xffff. To evaluate c != '\xff', the compiler will convert the explicitly declared unsigned character c to the integer value 0x00ff, thus making it unequal to the value of the character constant '\xff'.

It might seem this problem could be fixed by casting the character constant to an unsigned integer, as in

 if ( c != (unsigned) '\xff' )

but this cast simply converts 0xffff to an unsigned, rather than signed, int type. The immediate solution to this problem is to use the following cast:

 if ( c != (unsigned char) '\xff' )

The general rule is: Carefully cast any operation that involves a char variable and any operand other than another char variable.

C attracts some odd "characters," one of them being the manifest constant EOF, which is not really a character - it's an integer with a value of -1 - but which is returned by the getchar and other C functions. If you try the following loop with a compiler that uses unsigned char as the default for char variables:

 char c;
 while ( ( c = getchar() ) != EOF ) ...

you'll wait a long time before the loop ends. Because the value of c will always be treated as an unsigned integer, it will never equal -1. With a compiler that uses signed char as the default for char variables, the loop may end before the last character is read, since a character with a value that converts to an integer value of -1 may be read from the input stream.

Why did the C library designers name a function "get character," when the function actually returns an integer, and may cause your program to fail if you actually store the return value in a character variable? Maybe they were making a veiled suggestion that mastering this kind of C inconsistency was a good way for wimp programmers to "get some character." In any case, don't let something as "meaningless" as a function name trip you up. Always use int (not char) variables to store return values from fgetc, getc, getchar, putc, putchar, and ungetcfunctions.

All Functions Normal

Most C programmers have adopted the good programming practice: Always declare a function prototype at the beginning of any file in which you use the function.

This practice prevents accidentally treating a function's return value as int (the type the compiler assumes when no prototype is declared) when it is some other type. It also lets the compiler check that the proper type of arguments are specified when the function is used. For standard C library functions this principle implies the rule: Always include the header file for any standard library function you use. Following this example for your own functions, you should also define a header file with function prototypes for every file that has global functions that may be referenced in other files. Then you can include these user-defined header files as a simple - and foolproof - way to declare function prototypes for all shared functions.

Another practice that's available with recent C compilers is to declare formal parameters to functions as const, if they should not be changed. C passes function arguments by value, so you can never really change the variable that's passed by the calling program anyway. For example, in the following code, the value of arg1 is not changed in main, even though the corresponding parameter parm1 is changed in the function. A copy of arg1, stored in a temporary location, is what's changed by the function.

 main(...) {
 int arg1;
 f( arg1 );
 ...
 }
 void f( int parm1 ) {
 parm1 = 10;
 return;

So why bother with declaring function arguments as const, as in the following example?

 void f( const int parm1 );

The advantage of this type of declaration is that you'll be warned if you inadvertantly try to modify the argument, thinking the new value will be reflected in the calling function. This type of error is easily made by programmers used to languages, such as Pascal and COBOL, that let you pass arguments by reference and modify the value in the calling program by changing a parameter. Declaring a parameter as const also makes clear the parameter is meant as an "input-only" parameter.

Don't be lured into avoiding a const parameter specification because of the common C practice of using input-only parameters as if they were local variables. Although C's "pass-by-value" handling of arguments allows this techinique, the only potential advantages are saving a trivial amount of automatic storage (for a local variable) and execution time (for automatic storage allocation and an assignment).

What you give up is the added protection the compiler can provide against improper use of the input-only parameter.

In Chapter 6, I suggested you use array notation, such as x[], instead of pointer notation, such as *x, for clarity. This practice has an added benefit with array parameters because a function declaration like int strlen( const char str[] ); specifies that no element of the array argument can be changed. And, since array names are not names of pointer variables, no statement in the function can attempt to modify str itself. With pointer notation, you can also specify that no modifications be allowed via indirect references that use a pointer parameter:

 int strlen( const char * str );

But this doesn't prevent in advertant changes to the copy of the pointer itself:

 ++ str;

Only the const keyword, used with array notation, specifies that both the array address and it's contents must be treated as read-only within the function.

Do What I Mean, Not What I Say

Like a typical house cat, C programs sometimes seem to ignore direct commands. As an example, the following code appears to clearly say when it's time to leave.

 if ( x < o ) {
 printf( "Invalid value.\n" );
 exit;
 }

But no matter how negative x is, this program continues. In C, a function name without the argument list parentheses is simply evaluated as the function's address. It's perfectly legal, yet the function isn't actually invoked. Be sure you've coded parentheses after all function invocations.

"Gently Down the Stream…"

C stream I/O is the model of simplicity, yet it has some tricky areas, too. A defensive programmer might code

 c = getchar();
 if ( errno != 0 ) {
 /* handle error */
 }

But this code may report false errors because most of C's library functions set the library-defined variable errno to a non-zero value only when an error occurs.

Otherwise, they leave errno unchanged. Simply initializing errno before the call isn't an adequate solution, because a C library function may set errno,even if no error exists! Thus, the only safe approach to using errno is shown in the following example of using fopen

 errno = 0;
 fileptr = fopen( ... );
 if ( fileptr == NULL )  {
 /* An error occurred in fopen()
 | Now it's valid to examine errno
 */
 if ( errno != 0 ) {
 /* handle error */
 }
 }

The rule for using errno is: Set errno to 0 before a function call, and use errno only after a function returns a value indicating the function failed.

New Dimensions

Programmers moving to C from some other languages can be tripped up when they use multidimensional arrays. In many languages, a subscripted reference to a two-dimensional array has a form like x[ i, j ]. C is different, and a reference like the second statement below

 int x[10][10];
 y = x[ ++i, ++j ];

does not indicate that two subscripts are used in the reference to array x. Instead, ++i, ++j is a comma-separated sequence of expressions, and the expression value is the value of the last sub-expression in the sequence (i.e., j, after j is incremented). C doesn't actually have true multi-dimensional arrays. Recall that for the most part, C array notation is really a variation on pointer notation, and that a[i] is equivalent to *(a+i). To get the effect of a multi-dimensional array in C, you declare "arrays of arrays" (i.e., two levels of pointer-based addressing). The notation a[i][j] means *((*(a+i))+j). In the incorrect example above, the value of x[ ++i, +j&43;# ] is the same as *(x+(++j)), which is an address, not an integer. In C, always use one pair of [] for each level of array subscripting.

Order in the Court

In Chapters 2 and 6, I pointed out some of the problems that arise from C's rules (or lack of rules) for operator precedence and order of evaluation of expressions. I won't go through all the rules or unusual results that can occur, but observe that an expression like

 r = x * y * z;

may be evaluated as

 tmp = x * y;
 r   = tmp * z;

or as

 tmp = y * z;
 r   = x * tmp;

Note that even parentheses will not guarantee the ordering, and even (x * y) * z may be evaluated as

 tmp = y * z;
 r   = x * tmp;

In many cases, it may not matter what the order of evaluation is, but if it does, you should use separate statements to specify order-dependent operations.

The Name Game

As if C didn't offer enough problems on its own, the C programming culture sometimes seems to strive to create more traps for the unwary. One example is the widely used "Hungarian" naming convention, which uses partial capitalization for identifiers. Because C is case-sensitive, a variable hDlg is different than the variable hdlg. Woe to the programmer who has identifiers that differ only in case. Not only is there the obvious potential for elusive errors caused by typing mistakes, but some link editors change all global symbols to uppercase when linking multiple files, causing both hDlg and hdlg to be treated as HDLG.

You won't be able to avoid Hungarian notation when you work with some vendor-supplied libraries, such as the Microsoft Windows interface. But for your own code, especially global variables: Avoid identifiers that differ only in the case (i.e., upper and lower) of some letters. I recommend the simple, less error-prone, standard of using all lowercase identifiers, except for manifest constants. You should also be careful with some older link editors that may truncate global identifiers (the C standard requires only that the first 6 characters of an external identifier be used), causing the potential for additional collisions.

Although I've covered lots of C danger zones in the last 6 chapters, there are more waiting. Among the areas to watch carefully are: casting pointers; using C signals (Koenig has an enlightening - and alarming - discussion on using signals); using floating-point variables to approximate decimal values (such as currency); and portability problems, such as character representations and byte ordering. The principle that underlies all these rules is: Tread carefully in C; stick to simple and well-understood techiniques; and avoid "clever" programming. The truly clever C programmer is also an extremely cautious one.

C Coding Suggestions

  • Put parentheses (or other explicit delimiters) around the macro text and around each macro argument within the macro text.
  • Never pass an expression that has side effects as a macro argument.
  • Place simple increment and decrement operations in separate statements.
  • Avoid evaluating an argument more than once, if possible.
  • When defining a macro, be sure to consider all the contexts in which the macro may be used.
  • Carefully cast any operation that involves a char variable and any operand other than another char variable.
  • Always use int (not char) variables to store return values from fgetc, getc, getchar, putc, putchar, and ungetc functions.
  • Always declare a function prototype at the beginning of any file in which you use the function.
  • Define a header file with function prototypes for every file that has global functions that may be referenced in other files.
  • Declare formal parameters to functions as const, if they should not be changed.
  • Be sure you've coded parentheses after all function invocations.
  • Set errno to 0 before a function call, and use errno only after a function returns a value indicating the function failed.
  • In C, always use one pair of [] for each level of array subscripting.
  • Use separate statements to specify order-dependent operations.
  • Avoid identifiers that differ only in the case (i.e., upper and lower) of some letters.
  • Tread carefully in C; stick to simple and well-understood techniques; and avoid "clever" programming.

[Contents]