Friday, May 25, 2012

Sequence points in C


Sequence points in C.

Ever wondered what will be the value of the expression i++ * ++i in C/C++ when i = 1? If you have a compiler, try it once and do a mental exercise to find out the reason! But, before going to crack your head, check the same with more compilers and you may wonder why these compilers are giving different answers for the same expression! Now what is the reason behind this? Fortunately we have a written specification for the language from the standard committee of ISO. And they have clearly defined what should be the result of an expression in the language and all the modern compilers are following that standard. But still, somewhere something is missing with these tools or the expression evaluation? Before going to the details of this expression, we can look into some terms defined in the C standard.
Undefined Behavior: behavior, upon use of a non-portable or erroneous program construct or of erroneous data, for which the International Standard imposes no requirements.
An example of undefined behavior is the behavior on integer overflow.
Unspecified Behavior: use of an unspecified value, or other behavior where the International Standard provides two or more possibilities and imposes no further requirements on which is chosen in any instance.
An example of unspecified behavior is the order in which the arguments to a function are evaluated.
And thus, the language can have some behavior which is undefined or unspecified. In such situations, the output of the program will depend on the implementation; in other words the compiler. The compiler developer has the liberty to choose how the program should behave in such situations. But as a good programming practice, a programmer should never code in such a way that your code will give different output on different implementation. Now the answer of the first question is somewhat clear. It might be an undefined or unspecified behavior and hence the result. And under which rule it produce this behavior? That is sequence points.
Sequence points
 According to the C standard a sequence point is,
Accessing a volatile object, modifying an object, modifying a file, or calling a function that does any of those operations are all side effects, which are changes in the state of the execution environment. Evaluation of an expression may produce side effects. At certain specified points in the execution sequence called sequence points, all side effects of previous evaluations shall be complete and no side effects of subsequent evaluations shall have taken place.
In simple words, the side effect (change in execution environment) means, change of an object or file.  The changes are cumulative when the execution of instructions progress. Then there should be a mechanism which defines when and where all the changes in these objects will be stored back. This mechanism is known as sequence points. The change of a variable will not be written back to the memory whenever it happens, but only at the sequence points. That is all the side effects from the previous sequence point will be completed at the next sequence point. So, if the variable is changed multiple times between 2 sequence points, which value will it take for the next operation? The one, which got modified just now in the expression or the other, which has already stored back at the previous sequence point? This behavior is not defined in the standard. That means it’s an ‘Undefined Behavior’.
According to the C standard there are 7 sequence points and they are
Ø  The call to a function, after the arguments have been evaluated.
Ø  The end of the first operand of the following operators: logical AND '&&', logical OR '||', conditional '?', comma ','.
Ø  The end of a full declarator.
Ø  The end of a full expression: an initializer, the expression in an expression statement, the controlling expression of a selection statement (if or switch), the controlling expression of a ‘while’ or ‘do’ statement, each of the expressions of a ‘for’ statement, the expression in a return statement.
Ø  Immediately before a library function returns.
Ø  After the actions associated with each formatted input/output function conversion specifier.
Ø  Immediately before and immediately after each call to a comparison function, and also between any call to a comparison function and any movement of the objects passed as arguments to that call.
Each declarator declares one identifier and a full declarator is a declarator that is not part of another declarator. The last one deals with searching and sorting functions defined in C library. These comparison functions are passed as a function pointer and will be called from the search/sort function (call back functions). In general, this can be clubbed to the first one and make a more general point: “The call to a function or a call back function”.
A good coding practice: avoid more than one change in a single object, between 2 consecutive sequence points. The output will be compiler dependent and if you have any such undefined or unspecified behavior in your code, pray the compiler will never get changed or the new version of the compiler will not have a change in the behavior of the evaluation of the expression. Otherwise, you will end up in unexpected bugs from unexpected area!

No comments:

Post a Comment