[Comp.Sci.Dept, Utrecht] Note from archiver<at>cs.uu.nl: This page is part of a big collection of Usenet postings, archived here for your convenience. For matters concerning the content of this page, please contact its author(s); use the source, if all else fails. For matters concerning the archive as a whole, please refer to the archive description or contact the archiver.

Subject: C++ FAQ (part 12 of 14)

This article was archived around: NNTP-Posting-Mon, 17 Jun 2002 22:47:19 EDT

All FAQs in Directory: C++-faq
All FAQs posted in: comp.lang.c++, alt.comp.lang.learn.c-c++
Source: Usenet Version


Archive-name: C++-faq/part12 Posting-Frequency: monthly Last-modified: Jun 17, 2002 URL: http://www.parashift.com/c++-faq-lite/
AUTHOR: Marshall Cline / cline@parashift.com / 972-931-9470 COPYRIGHT: This posting is part of "C++ FAQ Lite." The entire "C++ FAQ Lite" document is Copyright(C)1991-2002 Marshall Cline, Ph.D., cline@parashift.com. All rights reserved. Copying is permitted only under designated situations. For details, see section [1]. NO WARRANTY: THIS WORK IS PROVIDED ON AN "AS IS" BASIS. THE AUTHOR PROVIDES NO WARRANTY WHATSOEVER, EITHER EXPRESS OR IMPLIED, REGARDING THE WORK, INCLUDING WARRANTIES WITH RESPECT TO ITS MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. C++-FAQ-Lite != C++-FAQ-Book: This document, C++ FAQ Lite, is not the same as the C++ FAQ Book. The book (C++ FAQs, Cline and Lomow, Addison-Wesley) is 500% larger than this document, and is available in bookstores. For details, see section [3]. ============================================================================== SECTION [28]: Newbie Questions / Answers [28.1] What is this "newbie section" all about? [NEW!] [Recently created (in 6/02).] It's a randomly ordered collection containing a few questions newbies might ask. * This section doesn't pretend to be organizied. Think of it as random. In truth, think of it as a hurried, initial cut by a busy guy. * This section doesn't pretend to be complete. Think of it as offering a little help to a few people. It won't help everyone and it might not help you. Hopefully someday I'll be able to improve this section, but for now, it is incomplete and unorganized. If that bothers you, my suggestion is to click that little x on the extreme upper right of your browser window :-). ============================================================================== [28.2] Where do I start? Why do I feel so confused, so stupid? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] Read the FAQ, especially the section on learning C++[27], read comp.lang.c++, read books[27.4] plural. But if everything still seems too hard, if you're feeling bombarded with mysterious terms and concepts, if you're wondering how you'll ever grasp anything, do this: 1. Type in some C++ code from any of the sources listed above. 2. Get it to compile and run. 3. Repeat. That's it. Just practice and play. Hopefully that will give you a foothold. ============================================================================== [28.3] What are the criteria for choosing between short / int / long data types? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] Other related questions: If a short int is the same size as an int on my particular implementation, why choose one or the other? If I start taking the actual size in bytes of the variables into account, won't I be making my code unportable (since the size in bytes may differ from implementation to implementation)? Or should I simply go with sizes much larger than I actually need, as a sort of safety buffer? Answer: It's usually a good idea to write code that can be ported to a different operating system and/or compiler. After all, if you're successful at what you do, someone else might want to use it somewhere else. This can be a little tricky with built-in types like int and short, since C++ doesn't give guaranteed sizes. However C++ does give you guaranteed minimum sizes, and that will usually be all you need to know. C++ guarantees a char is exactly one byte[25.1], short is at least 2 bytes, int is at least 2 bytes, and long is at least 4 bytes. It also guarantees the unsigned version of each of these is the same size as the original, for example, sizeof(unsigned short) == sizeof(short). When writing portable code, you shouldn't make additional assumptions about these sizes. For example, don't assume int has 4 bytes. If you have an integral variable that needs at least 4 bytes, use a long or unsigned long even if sizeof(int) == 4 on your particular implementation. On the other hand, if you have an integral variable quantity that will always fit within 2 bytes and if you want to minimize the use of data memory, use a short or unsigned short even if you know sizeof(int) == 2 on your particular implementation. Note that there are some subtle tradeoffs here. In some cases, your computer might be able to manipulate smaller things faster than bigger things, but in other cases it is exactly the opposite: int arithmetic might be faster than short arithmetic on some implementations. Another tradeoff is data-space against code-space: int arithmetic might generate less binary code than short arithmetic on some implementations. Don't make simplistic assumptions. Just because a particular variable can be declared as short doesn't necessarily mean it should, even if you're trying to save space. ============================================================================== [28.4] What the heck is a const variable? Isn't that a contradiction in terms? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] If it bothers you, call it a "const identifier" instead. The main issue is to figure out what it is; we can figure out what to call it later. For example, consider the symbol max in the following function: void f() { const int max = 107; ... float array[max]; ... } It doesn't matter whether you call max a const variable or a const identifier. What matters is that you realize it is like a normal variable in some ways (e.g., you can take its address or pass it by const-reference), but it is unlike a normal variable in that you can't change its value. Here is another even more common example: class Fred { public: ... private: static const int max_ = 107; ... }; In this example, you would need to add the line int Fred::max_; in exactly one .cpp file, typically in Fred.cpp. It is generally considered good programming practice to give each "magic number" (like 107) a symbolic name and use that name rather than the raw magic number[28.9]. ============================================================================== [28.5] Why would I use a const variable / const identifier as opposed to #define? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] const identifiers are often better than #define because: * they obey the language's scoping rules * you can see them in the debugger * you can take their address if you need to * you can pass them by const-reference if you need to * they don't create new "keywords" in your program. In short, const identifiers act like they're part of the language because they are part of the language. The preprocessor can be thought of as a language layered on top of C++. You can imagine that the preprocessor runs as a separate pass through your code, which would mean your original source code would be seen only by the preprocessor, not by the C++ compiler itself. In other words, you can imagine the preprocessor sees your original source code and replaces all #define symbols with their values, then the C++ compiler proper sees the modified source code after the original symbols got replaced by the preprocessor. There are cases where #define is needed, but you should generally avoid it when you have the choice. You should evaluate whether to use const vs. #define based on business value: time, money, risk. In other words, one size does not fit all. Most of the time you'll use const rather than #define for constants, but sometimes you'll use #define. But please remember to wash your hands afterwards. ============================================================================== [28.6] Are you saying that the preprocessor is evil? [NEW!] [Recently created (in 6/02).] Yes, that's exactly what I'm saying: the preprocessor is evil[6.14]. Every #define macro effectively creates a new keyword in every source file and every scope until that symbol is #undefd. The preprocessor lets you create a #define symbol that is always replaced independent of the {...} scope where that symbol appears. Sometimes we need the preprocessor, such as the #ifndef/#define wrapper within each header file, but it should be avoided when you can. "Evil" doesn't mean "never use."[6.14] You will use evil things sometimes, particularly when they are "the lesser of two evils." But they're still evil :-) ============================================================================== [28.7] What is the "standard library"? What is included / excluded from it? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] Most (not all) implementations have a "standard include" directory, sometimes directories plural. If your implementation is like that, the headers in the standard library are probably a subset of the files in those directories. For example, iostream and string are part of the standard library, as is cstring and cstdio. There are a bunch of .h files that are also part of the standard libarary, but not every .h file in those directories is part of the standard library. For example, stdio.h is but windows.h is not. You include headers from the standard library like this: #include <iostream> int main() { std::cout << "Hello world!\n"; return 0; } ============================================================================== [28.8] How should I lay out my code? When should I use spaces, tabs, and/or newlines in my code? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] The short answer is: Just like the rest of your team. In other words, the team should use a consistent approach to whitespace, but otherwise please don't waste a lot of time worrying about it. Here are a few details: There is no universally accepted coding standard when it comes to whitespace. There are a few popular whitespace standards, such as the "one true brace" style, but there is a lot of contention over certain aspects of any given coding standard. Most whitespace standards agree on a few points, such as putting a space around infix operators like x * y or a - b. Most (not all) whitespace standards do not put spaces around the [ or ] in a[i], and similar comments for ( and ) in f(x). However there is a great deal of contention over vertical whitespace, particularly when it comes to { and }. For example, here are a few of the many ways to lay out if (foo()) { bar(); baz(); }: if (foo()) { bar(); baz(); } if (foo()) { bar(); baz(); } if (foo()) { bar(); baz(); } if (foo()) { bar(); baz(); } if (foo()) { bar(); baz(); } ...and others... IMPORTANT: Do NOT email me with reasons your whitespace approach is better than the others. I don't care. Plus I won't believe you. There is no objective standard of "better" when it comes to whitespace so your opinion is just that: your opinion. If you write me an email in spite of this paragraph, I will consider you to be a hopeless geek who focuses on nits. Don't waste your time worrying about whitespace: as long as your team uses a consistent whitespace style, get on with your life and worry about more important things. For example, things you should be worried about include design issues like when ABCs[22.3] should be used, whether inheritance should be an implementation or specification technique, what testing and inspection strategies should be used, whether interfaces should uniformly have a get() and/or set() member function for each data member, whether interfaces should be designed from the outside-in or the inside-out, whether errors be handled by try/catch/throw or by return codes, etc. Read the FAQ for some opinions on those important questions, but please don't waste your time arguing over whitespace. As long as the team is using a consistent whitespace strategy, drop it. ============================================================================== [28.9] Is it okay if a lot of numbers appear in my code? [NEW!] [Recently created (in 6/02).] Probably not. In many (not all) cases, it's best to name your numbers so each number appears only once in your code. That way, when the number changes there will only be one place in the code that has to change. For example, suppose your program is working with shipping crates. The weight of an empty crate is 5.7. The expression 5.7 + contentsWeight probably means the weight of the crate including its contents, meaning the number 5.7 probably appear many times in the software. All these occurances of the number 5.7 will be difficult to find and change when (not if) somebody changes the style of crates used in this application. The solution is to make sure the value 5.7 appears exactly once, usually as the initializer for a const identifier. Typically this will be something like const double crateWeight = 5.7;. After that, 5.7 + contentsWeight would be replaced by crateWeight + contentsWeight. Now that's the general rule of thumb. But unfortunately there is some fine print. Some people believe one should never have numeric literals scattered in the code. They believe all numeric values should be named in a manner similar to that described above. That rule, however noble in intent, just doesn't work very well in practice. It is too tedious for people to follow, and ultimately it costs companies more than it saves them. Remember: the goal of all programming rules is to reduce time, cost and risk. If a rule actually makes things worse, it is a bad rule, period. A more practical rule is to focus on those values that are likely to change. For example, if a numeric literal is likely to change, it should appear only once in the software, usually as the initializer of a const identifier. This rule lets unchanging values, such as some occurances of 0, 1, -1, etc., get coded directly in the software so programmers don't have to search for the one true definition of one or zero. In other words, if a programmer wants to loop over the indices of a vector, he can simply write for (int i = 0; i < v.size(); ++i). The "extremist" rule described earlier would require the programmer to poke around asking if anybody else has defined a const identifier initialized to 0, and if not, to define his own const int zero = 0; then replace the loop with for (int i = zero; i < v.size(); ++i). This is all a waste of time since the loop will always start with 0. It adds cost without adding any value to compensate for that cost. Obviously people might argue over exactly which values are "likely to change," but that kind of judgment is why you get paid the big bucks: do your job and make a decision. Some people are so afraid of making a wrong decision that they'll adopt a one-size-fits-all rule such as "give a name to every number." But if you adopt rules like that, you're guaranteed to have made the wrong decision: those rules cost your company more than they save. They are bad rules. The choice is simple: use a flexible rule even though you might make a wrong decision, or use a one-size-fits-all rule and be guaranteed to make a wrong decision. There is one more piece of fine print: where the const identifier should be defined. There are three typical cases: * If the const identifier is used only within a single function, it can be local to that function. * If the const identifier is used throughout a class and no where else, it can be static within the private part of that class. * If the const identifier is used in numerous classes, it can be static within the public part of the most appropriate class, or perhaps private in that class with a public static access method. As a last resort, make it static within a namespace or perhaps put it in the unnamed namespace. Try very hard to avoid using #define since the preprocessor is evil[28.6]. If you absolutely must use #define, wash your hands when you're done. And please ask some friends if they know of a better alternative. (As used throughout the FAQ, "evil" doesn't mean "never use it."[6.14] There are times when you will use something that is "evil" since it will be, in that particular case, "the lesser of two evils.") ============================================================================== [28.10] What's the point of the L, U and f suffixes on numeric literals? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] You should use these suffixes when you need to force the compiler to treat the numeric literal as if it was the specified type. For example, if x is of type float, the expression x + 5.7 is of type double: it first promotes the value of x to a double, then performs the arithmetic using double-precision instructions. If that is what you want, fine; but if you really wanted it to do the arithmetic using single-precision instructions, you can change that code to x + 5.7f. Note: it is even better to "name" your numeric literals, particularly those that are likely to change[28.9]. That would require you to say x + crateWeight where crateWeight is a const float that is initialized to 5.7f. The U suffix is similar. It's probably a good idea to use unsigned integers for variables that are always >= 0. For example, if a variable represents an index into an array, that variable would typically be declared as an unsigned. The main reason for this is it requires less code, at least if you are careful to check your ranges. For example, to check if a variable is both >= 0 and < max requires two tests if everything is signed: if (n >= 0 && n < max), but can be done with a single comparison if everything is unsigned: if (n < max). If you end up using unsigned variables, it is generally a good idea to force your numeric literals to also be unsigned. That makes it easier to see that the compiler will generate "unsigned arithmetic" instructions. For example: if (n < 256U) or if ((n & 255u) < 32u). Mixing signed and unsigned values in a single arithmetic expression is often confusing for programmers -- the compiler doesn' always do what you expect it should do. The L suffix is not as common, but it is occasionally used for similar reasons as above: to make it obvious that the compiler is using long arithmetic. The bottom line is this: it is a good discipline for programmers to force all numeric operands to be of the right type, as opposed to relying on the C++ rules for promoting/demoting numeric expressions. For example, if x is of type int and y is of type unsigned, it is a good idea to change x + y so the next programmer knows whether you intended to use unsigned arithmetic, e.g., unsigned(x) + y, or signed arithmetic: x + int(y). The other possibility is long arithmetic: long(x) + long(y). By using those casts, the code is more explicit and that's good in this case, since a lot of programmers don't know all the rules for implicit promotions. ============================================================================== [28.11] I can understand the and (&&) and or (||) operators, but what's the purpose of the not (!) operator? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] Some people are confused about the ! operator. For example, they think that !true is the same as false, or that !(a < b) is the same as a >= b[28.12], so in both cases the ! operator doesn't seem to add anything. Answer: The ! operator is useful in boolean expressions, such occur in an if or while statement. For example, let's assume A and B are boolean expressions, perhaps simple method-calls that return a bool. There are all sorts of ways to combine these two expressions: if ( A && B) ... if (!A && B) ... if ( A && !B) ... if (!A && !B) ... if (!( A && B)) ... if (!(!A && B)) ... if (!( A && !B)) ... if (!(!A && !B)) ... Along with a similar group formed using the || operator. Note: boolean algebra can be used to transform each of the &&-versions into an equivalent ||-version, so from a truth-table standpoint there are only 8 logically distinct if statements. However, since readability is so important in software, programmers should consider both the &&-version and the logically equivalent ||-version. For example, programmers should choose between !A && !B and !(A || B) based on which one is more obvious to whoever will be maintaining the code. In that sense there really are 16 different choices. The point of all this is simple: the ! operator is quite useful in boolean expressions. Sometimes it is used for readability, and sometimes it is used because expressions like !(a < b) actually are not[28.12] equivalent to a >= b in spite of what your grade school math teacher told you. ============================================================================== [28.12] Is !(a < b) logically the same as a >= b? [NEW!] [Recently created (in 6/02).] No! Despite what your grade school math teacher taught you, these equivalences don't always work in software, especially with floating point expressions or user-defined types. Example: if a is a floating point NaN[28.13], then both a < b and a >= b will be false. That means !(a < b) will be true and a >= b will be false. Example: if a is an object of class Foo that has overloaded operator< and operator>=, then it is up to the creator of class Foo if these operators will have opposite semantics. They probably should have opposite semantics, but that's up to whoever wrote class Foo. ============================================================================== [28.13] What is this NaN thing? [NEW!] [Recently created (in 6/02).] NaN means "not a number," and is used for floating point operations. There are lots of floating point operations that don't make sense, such as dividing by zero, taking the log of zero or a negative number, taking the square root of a negative number, etc. Depending on your compiler, some of these operations may produce special floating point values such as infinity (with distinct values for positive vs. negative infinity) and the not a number value, NaN. If your compiler produces a NaN, it has the unusual property that it is not equal to any value, including itself. For example, if a is NaN, then a == a is false. That is the usual way to check if you are dealing with a NaN: void funct(double x) { if (x == x) { // x is a normal value ... } else { // x is NaN ... } } Similarly, if a is NaN and b is some arbitrary floating point value, a will be neither less than, equal to, nor greater than b. In other words, a < b, a <= b, a > b, a >= b, and a == b will all return false. ============================================================================== [28.14] What is the type of an enumeration such as enum Color? Is it of type int? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] An enumeration such as enum Color { red, white, blue }; is its own type. It is not of type int. When you create an object of an enumeration type, e.g., Color x;, we say that the object x is of type Color. Object x isn't of type "enumeration," and it's not of type int. An expression of an enumeration type can be converted to a temporary int. An analogy may help here. An expression of type float can be converted to a temporary double, but that doesn't mean float is a subtype of double. For example, after the declaration float y;, we say that y is of type float, and the expression y can be converted to a temporary double. When that happens, a brand new, temporary double is created by copying something out of y. In the say way, a Color object such as x can be converted to a temporary int, in which case a brand new, temporary int is created by copying something out of x. (Note: the only purpose of the float / double analogy in this paragraph is to help explain how expressions of an enumeration type can be converted to temporary ints; do not try to use that analogy to imply any other behavior!) The above conversion is very different from a subtype relationship, such as the relationship between derived class Car and its base class Vehicle. For example, an object of class Car, such as Car z;, actually is an object of class Vehicle, therefore you can bind a Vehicle& to that object, e.g., Vehicle& v = z;. Unlike the previous paragraph, the object z is not copied to a temporary; reference v binds to z itself. So we say an object of class Car is a Vehicle, but an object of class "Color" simply can be copied/converted into a temporary int. Big difference. Final note, especially for C programmers: the C++ compiler will not automatically convert an int expression to a temporary Color[28.15]. Since that sort of conversion is unsafe, it requires a cast, e.g., Color x = Color(2);. ============================================================================== [28.15] If an enumeration type is distinct from any other type, what good is it? What can you do with it? [NEW!] [Recently created thanks to a question from John Lester (in 6/02).] Let's consider this enumeration type: enum Color { red, white, blue };. The best way to look at this (C programmers: hang on to your seats!!) is that the values of this type are red, white, and blue, as opposed to merely thinking of those names as constant int values. The C++ compiler provides an automatic conversion from Color to int, and the converted values will be, in this case, 0, 1, and 2 respectively. But you shouldn't think of blue as a fancy name for 2. blue is of type Color and there is an automatic conversion from blue to 2, but the inverse conversion, from int to Color, is not provided automatically by the C++ compiler. Here is an example that illustrates the conversion from Color to int: enum Color { red, white, blue }; void f() { int n; n = red; // n will now have value 0 n = white; // n will now have value 1 n = blue; // n will now have value 2 } The following example also demonstrates the conversion from Color to int: void f() { Color x = red; Color y = white; Color z = blue; int n; n = x; // n will now have value 0 n = y; // n will now have value 1 n = z; // n will now have value 2 } However the inverse conversion, from int to Color, is not automatically provided by the C++ compiler: void f() { Color x; x = blue; // okay: change x to blue x = 2; // compile-time error: can't convert int to Color } The last line above shows that enumeration types are not ints in disguise. You can think of them as int types if you want to, but if you do, you must remember that the C++ compiler will not implicitly convert an int to a Color. If you really want that, you can use a cast: void f() { Color x; x = red; // okay: x will now have the value red x = Color(1); // okay: x will now have the value white x = Color(2); // okay: x will now have the value blue x = 2; // compile-time error: can't convert int to Color } There are other ways that enumeration types are unlike int. For example, enumeration types don't have a ++ operator: void f() { int n = red; // n will now have value 0 Color x = red; // x will now have value red n++; // okay: n will now have value 1 x++; // compile-time error: can't ++ an enumeration } ============================================================================== SECTION [29]: Learning C++ if you already know Smalltalk [29.1] What's the difference between C++ and Smalltalk? Both fully support the OO paradigm. Neither is categorically and universally "better" than the other[6.4]. But there are differences. The most important differences are: * Static typing vs. dynamic typing[29.2] * Whether inheritance must be used only for subtyping[29.5] * Value vs. reference semantics[30] Note: Many new C++ programmers come from a Smalltalk background. If that's you, this section will tell you the most important things you need know to make your transition. Please don't get the notion that either language is somehow "inferior" or "bad"[6.4], or that this section is promoting one language over the other (I am not a language bigot; I serve on both the ANSI C++ and ANSI Smalltalk standardization committees[6.11]). Instead, this section is designed to help you understand (and embrace!) the differences. ============================================================================== [29.2] What is "static typing," and how is it similar/dissimilar to Smalltalk? Static typing says the compiler checks the type safety of every operation statically (at compile-time), rather than to generate code which will check things at run-time. For example, with static typing, the signature matching for function arguments is checked at compile time, not at run-time. An improper match is flagged as an error by the compiler, not by the run-time system. In OO code, the most common "typing mismatch" is invoking a member function against an object which isn't prepared to handle the operation. E.g., if class Fred has member function f() but not g(), and fred is an instance of class Fred, then fred.f() is legal and fred.g() is illegal. C++ (statically typed) catches the error at compile time, and Smalltalk (dynamically typed) catches the error at run-time. (Technically speaking, C++ is like Pascal --pseudo statically typed-- since pointer casts and unions can be used to violate the typing system; which reminds me: use pointer casts[26.10] and unions only as often as you use gotos). ============================================================================== [29.3] Which is a better fit for C++: "static typing" or "dynamic typing"? [For context, please read the previous FAQ[29.2]]. If you want to use C++ most effectively, use it as a statically typed language. C++ is flexible enough that you can (via pointer casts, unions, and #define macros) make it "look" like Smalltalk. But don't. Which reminds me: try to avoid #define: it is evil[6.14] in 4 different ways: evil#1[9.3], evil#2[36.2], evil#3[36.3], and evil#4[36.4]. There are places where pointer casts and unions are necessary and even wholesome, but they should be used carefully and sparingly. A pointer cast tells the compiler to believe you. An incorrect pointer cast might corrupt your heap, scribble into memory owned by other objects, call nonexistent member functions, and cause general failures. It's not a pretty sight[26.10]. If you avoid these and related constructs, you can make your C++ code both safer and faster, since anything that can be checked at compile time is something that doesn't have to be done at run-time. If you're interested in using a pointer cast, use the new style pointer casts. The most common example of these is to change old-style pointer casts such as (X*)p into new-style dynamic casts such as dynamic_cast<X*>(p), where p is a pointer and X is a type. In addition to dynamic_cast, there is static_cast and const_cast, but dynamic_cast is the one that simulates most of the advantages of dynamic typing (the other is the typeid() construct; for example, typeid(*p).name() will return the name of the type of *p). ============================================================================== [29.4] How do you use inheritance in C++, and is that different from Smalltalk? Some people believe that the purpose of inheritance is code reuse. In C++, this is wrong. Stated plainly, "inheritance is not for code reuse." The purpose of inheritance in C++ is to express interface compliance (subtyping), not to get code reuse. In C++, code reuse usually comes via composition rather than via inheritance. In other words, inheritance is mainly a specification technique rather than an implementation technique. This is a major difference with Smalltalk, where there is only one form of inheritance (C++ provides private inheritance to mean "share the code but don't conform to the interface", and public inheritance to mean "kind-of"). The Smalltalk language proper (as opposed to coding practice) allows you to have the effect of "hiding" an inherited method by providing an override that calls the "does not understand" method. Furthermore Smalltalk allows a conceptual "is-a" relationship to exist apart from the inheritance hierarchy (subtypes don't have to be derived classes; e.g., you can make something that is-a Stack yet doesn't inherit from class Stack). In contrast, C++ is more restrictive about inheritance: there's no way to make a "conceptual is-a" relationship without using inheritance (the C++ work-around is to separate interface from implementation via ABCs[22.3]). The C++ compiler exploits the added semantic information associated with public inheritance to provide static typing. ============================================================================== [29.5] What are the practical consequences of differences in Smalltalk/C++ inheritance? [For context, please read the previous FAQ[29.4]]. Smalltalk lets you make a subtype that isn't a derived class, and allows you to make a derived class that isn't a subtype. This allows Smalltalk programmers to be very carefree in putting data (bits, representation, data structure) into a class (e.g., you might put a linked list into class Stack). After all, if someone wants an array-based-Stack, they don't have to inherit from Stack; they could inherit such a class from Array if desired, even though an ArrayBasedStack is not a kind-of Array! In C++, you can't be nearly as carefree. Only mechanism (member function code), but not representation (data bits) can be overridden in derived classes. Therefore you're usually better off not putting the data structure in a class. This leads to a stronger reliance on abstract base classes[22.3]. I like to think of the difference between an ATV and a Maseratti. An ATV (all terrain vehicle) is more fun, since you can "play around" by driving through fields, streams, sidewalks, and the like. A Maseratti, on the other hand, gets you there faster, but it forces you to stay on the road. My advice to C++ programmers is simple: stay on the road. Even if you're one of those people who like the "expressive freedom" to drive through the bushes, don't do it in C++; it's not a good fit. ============================================================================== SECTION [30]: Reference and value semantics [30.1] What is value and/or reference semantics, and which is best in C++? With reference semantics, assignment is a pointer-copy (i.e., a reference). Value (or "copy") semantics mean assignment copies the value, not just the pointer. C++ gives you the choice: use the assignment operator to copy the value (copy/value semantics), or use a pointer-copy to copy a pointer (reference semantics). C++ allows you to override the assignment operator to do anything your heart desires, however the default (and most common) choice is to copy the value. Pros of reference semantics: flexibility and dynamic binding (you get dynamic binding in C++ only when you pass by pointer or pass by reference, not when you pass by value). Pros of value semantics: speed. "Speed" seems like an odd benefit for a feature that requires an object (vs. a pointer) to be copied, but the fact of the matter is that one usually accesses an object more than one copies the object, so the cost of the occasional copies is (usually) more than offset by the benefit of having an actual object rather than a pointer to an object. There are three cases when you have an actual object as opposed to a pointer to an object: local objects, global/static objects, and fully contained member objects in a class. The most important of these is the last ("composition"). More info about copy-vs-reference semantics is given in the next FAQs. Please read them all to get a balanced perspective. The first few have intentionally been slanted toward value semantics, so if you only read the first few of the following FAQs, you'll get a warped perspective. Assignment has other issues (e.g., shallow vs. deep copy) which are not covered here. ============================================================================== [30.2] What is "virtual data," and how-can / why-would I use it in C++? virtual data allows a derived class to change the exact class of a base class's member object. virtual data isn't strictly "supported" by C++, however it can be simulated in C++. It ain't pretty, but it works. To simulate virtual data in C++, the base class must have a pointer to the member object, and the derived class must provide a new object to be pointed to by the base class's pointer. The base class would also have one or more normal constructors that provide their own referent (again via new), and the base class's destructor would delete the referent. For example, class Stack might have an Array member object (using a pointer), and derived class StretchableStack might override the base class member data from Array to StretchableArray. For this to work, StretchableArray would have to inherit from Array, so Stack would have an Array*. Stack's normal constructors would initialize this Array* with a new Array, but Stack would also have a (possibly protected) constructor that would accept an Array* from a derived class. StretchableStack's constructor would provide a new StretchableArray to this special constructor. Pros: * Easier implementation of StretchableStack (most of the code is inherited) * Users can pass a StretchableStack as a kind-of Stack Cons: * Adds an extra layer of indirection to access the Array * Adds some extra freestore allocation overhead (both new and delete) * Adds some extra dynamic binding overhead (reason given in next FAQ) In other words, we succeeded at making our job easier as the implementer of StretchableStack, but all our users pay for it[30.5]. Unfortunately the extra overhead was imposed on both users of StretchableStack and on users of Stack. Please read the rest of this section. (You will not get a balanced perspective without the others.) ============================================================================== [30.3] What's the difference between virtual data and dynamic data? The easiest way to see the distinction is by an analogy with virtual functions[20]: A virtual member function means the declaration (signature) must stay the same in derived classes, but the definition (body) can be overridden. The overriddenness of an inherited member function is a static property of the derived class; it doesn't change dynamically throughout the life of any particular object, nor is it possible for distinct objects of the derived class to have distinct definitions of the member function. Now go back and re-read the previous paragraph, but make these substitutions: * "member function" --> "member object" * "signature" --> "type" * "body" --> "exact class" After this, you'll have a working definition of virtual data. Another way to look at this is to distinguish "per-object" member functions from "dynamic" member functions. A "per-object" member function is a member function that is potentially different in any given instance of an object, and could be implemented by burying a function pointer in the object; this pointer could be const, since the pointer will never be changed throughout the object's life. A "dynamic" member function is a member function that will change dynamically over time; this could also be implemented by a function pointer, but the function pointer would not be const. Extending the analogy, this gives us three distinct concepts for data members: * virtual data: the definition (class) of the member object is overridable in derived classes provided its declaration ("type") remains the same, and this overriddenness is a static property of the derived class * per-object-data: any given object of a class can instantiate a different conformal (same type) member object upon initialization (usually a "wrapper" object), and the exact class of the member object is a static property of the object that wraps it * dynamic-data: the member object's exact class can change dynamically over time The reason they all look so much the same is that none of this is "supported" in C++. It's all merely "allowed," and in this case, the mechanism for faking each of these is the same: a pointer to a (probably abstract) base class. In a language that made these "first class" abstraction mechanisms, the difference would be more striking, since they'd each have a different syntactic variant. ============================================================================== [30.4] Should I normally use pointers to freestore allocated objects for my data members, or should I use "composition"? Composition. Your member objects should normally be "contained" in the composite object (but not always; "wrapper" objects are a good example of where you want a pointer/reference; also the N-to-1-uses-a relationship needs something like a pointer/reference). There are three reasons why fully contained member objects ("composition") has better performance than pointers to freestore-allocated member objects: * Extra layer of indirection every time you need to access the member object * Extra freestore allocations (new in constructor, delete in destructor) * Extra dynamic binding (reason given below) ============================================================================== [30.5] What are relative costs of the 3 performance hits associated with allocating member objects from the freestore? The three performance hits are enumerated in the previous FAQ: * By itself, an extra layer of indirection is small potatoes * Freestore allocations can be a performance issue (the performance of the typical implementation of malloc() degrades when there are many allocations; OO software can easily become "freestore bound" unless you're careful) * The extra dynamic binding comes from having a pointer rather than an object. Whenever the C++ compiler can know an object's exact class, virtual[20] function calls can be statically bound, which allows inlining. Inlining allows zillions (would you believe half a dozen :-) optimization opportunities such as procedural integration, register lifetime issues, etc. The C++ compiler can know an object's exact class in three circumstances: local variables, global/static variables, and fully-contained member objects Thus fully-contained member objects allow significant optimizations that wouldn't be possible under the "member objects-by-pointer" approach. This is the main reason that languages which enforce reference-semantics have "inherent" performance challenges. Note: Please read the next three FAQs to get a balanced perspective! ============================================================================== [30.6] Are "inline virtual" member functions ever actually "inlined"? Occasionally... When the object is referenced via a pointer or a reference, a call to a virtual[20] function cannot be inlined, since the call must be resolved dynamically. Reason: the compiler can't know which actual code to call until run-time (i.e., dynamically), since the code may be from a derived class that was created after the caller was compiled. Therefore the only time an inline virtual call can be inlined is when the compiler knows the "exact class" of the object which is the target of the virtual function call. This can happen only when the compiler has an actual object rather than a pointer or reference to an object. I.e., either with a local object, a global/static object, or a fully contained object inside a composite. Note that the difference between inlining and non-inlining is normally much more significant than the difference between a regular function call and a virtual function call. For example, the difference between a regular function call and a virtual function call is often just two extra memory references, but the difference between an inline function and a non-inline function can be as much as an order of magnitude (for zillions of calls to insignificant member functions, loss of inlining virtual functions can result in 25X speed degradation! [Doug Lea, "Customization in C++," proc Usenix C++ 1990]). A practical consequence of this insight: don't get bogged down in the endless debates (or sales tactics!) of compiler/language vendors who compare the cost of a virtual function call on their language/compiler with the same on another language/compiler. Such comparisons are largely meaningless when compared with the ability of the language/compiler to "inline expand" member function calls. I.e., many language implementation vendors make a big stink about how good their dispatch strategy is, but if these implementations don't inline member function calls, the overall system performance would be poor, since it is inlining --not dispatching-- that has the greatest performance impact. Note: Please read the next two FAQs to see the other side of this coin! ============================================================================== [30.7] Sounds like I should never use reference semantics, right? Wrong. Reference semantics are A Good Thing. We can't live without pointers. We just don't want our s/w to be One Gigantic Rats Nest Of Pointers. In C++, you can pick and choose where you want reference semantics (pointers/references) and where you'd like value semantics (where objects physically contain other objects etc). In a large system, there should be a balance. However if you implement absolutely everything as a pointer, you'll get enormous speed hits. Objects near the problem skin are larger than higher level objects. The identity of these "problem space" abstractions is usually more important than their "value." Thus reference semantics should be used for problem-space objects. Note that these problem space objects are normally at a higher level of abstraction than the solution space objects, so the problem space objects normally have a relatively lower frequency of interaction. Therefore C++ gives us an ideal situation: we choose reference semantics for objects that need unique identity or that are too large to copy, and we can choose value semantics for the others. Thus the highest frequency objects will end up with value semantics, since we install flexibility where it doesn't hurt us (only), and we install performance where we need it most! These are some of the many issues the come into play with real OO design. OO/C++ mastery takes time and high quality training. If you want a powerful tool, you've got to invest. Don't stop now! Read the next FAQ too!! ============================================================================== [30.8] Does the poor performance of reference semantics mean I should pass-by-value? Nope. The previous FAQ were talking about member objects, not parameters. Generally, objects that are part of an inheritance hierarchy should be passed by reference or by pointer, not by value, since only then do you get the (desired) dynamic binding (pass-by-value doesn't mix with inheritance, since larger derived class objects get sliced[20.6] when passed by value as a base class object). Unless compelling reasons are given to the contrary, member objects should be by value and parameters should be by reference. The discussion in the previous few FAQs indicates some of the "compelling reasons" for when member objects should be by reference. ==============================================================================