Without a good problem definition, you might put effort into solving the wrong problem.
Be sure you know what you’re aiming at before you shoot.
Stable Requirements
Checklist: Requirements
Specific Functional Requirements
Are all the inputs to the system specified, including their source, accuracy, range of values, and frequency?
Are all the outputs from the system specified, including their destination, accuracy, range of values, frequency, and format?
Are all output formats specified for web pages, reports, and so on?
Are all the external hardware and software interfaces specified?
Are all the external communication interfaces specified, including handshaking, error-checking, and communication protocols?
Are all the tasks the user wants to perform specified?
Is the data used in each task and the data resulting from each task specified?
Specific Non-Functional (Quality) Requirements
Is the expected response time, from the user’s point of view, specified for all necessary operations?
Are other timing considerations specified, such as processing time, data-transfer rate, and system throughput?
Is the level of security specified?
Is the reliability specified, including the consequences of software failure, the vital information that needs to be protected from failure, and the strategy for error detection and recovery?
Is maximum memory specified?
Is the maximum storage specified?
Is the maintainability of the system specified, including its ability to adapt to changes in specific functionality, changes in the operating environment, and changes in its interfaces with other software?
Is the definition of success included? Of failure?
Requirements Quality
Are the requirements written in the user’s language? Do the users think so? Does each requirement avoid conflicts with other requirements?
Are acceptable trade-offs between competing attributes specified—for example, between robustness and correctness?
Do the requirements avoid specifying the design?
Are the requirements at a fairly consistent level of detail? Should any requirement be specified in more detail? Should any requirement be specified in less detail?
Are the requirements clear enough to be turned over to an independent group for construction and still be understood?
Is each item relevant to the problem and its solution? Can each item be traced to its origin in the problem environment?
Is each requirement testable? Will it be possible for independent testing to determine whether each requirement has been satisfied?
Are all possible changes to the requirements specified, including the likelihood of each change?
Requirements Completeness
Where information isn’t available before development begins, are the areas of incompleteness specified?
Are the requirements complete in the sense that if the product satisfies every requirement, it will be acceptable?
Are you comfortable with all the requirements? Have you eliminated requirements that are impossible to implement and included just to appease your customer or your boss?
Typical Architectural Components
Program Organization
A system architecture first needs an overview that describes the system in broad terms.
In the architecture, you should find evidence that alternatives to the final organization were considered and find the reasons the organization used was chosen over the alternatives.
One review of design practices found that the design rationale is at least as important for maintenance as the design itself.
Every feature listed in the requirements should be covered by at least one building block. If a function is claimed by two or more building blocks, their claims should cooperate, not conflict.
Major Classes
The architecture should specify the major classes to be used. It should identify the responsibilities of each major class and how the class will interact with other classes.
It should include descriptions of the class hierarchies, of state transitions, and of object persistence.
If the system is large enough, it should describe how classes are organized into subsystems.
The architecture should describe other class designs that were considered and give reasons for preferring the organization that was chosen.
The architecture doesn’t need to specify every class in the system; aim for the 80/20 rule: specify the 20 percent of the classes that make up 80 percent of the systems’ behavior.
Data Design
The architecture should describe the major files and table designs to be used.
User Interface Design
Sometimes the user interface is specified at requirements time.
If it isn’t, it should be specified in the software architecture. The architecture should specify major elements of web page formats, GUIs, command line interfaces, and so on.
Careful architecture of the user interface makes the difference between a well- liked program and one that’s never used.
Input/Output
Input/output is another area that deserves attention in the architecture.
The architecture should specify a look-ahead, look-behind, or just-in-time reading scheme.
And it should describe the level at which I/O errors are detected: at the field, record, stream, or file level.
Resource Management
The architecture should describe a plan for managing scarce resources such as database connections, threads, and handles.
Memory management is another important area for the architecture to treat in memory-constrained applications areas like driver development and embedded systems.
Security
The architecture should describe the approach to design-level and code-level security.
Performance
If performance is a concern, performance goals should be specified in the requirements.
Performance goals can include both speed and memory use.
Scalability
Scalability is the ability of a system to grow to meet future demands.
The architecture should describe how the system will address growth in number of users, number of servers, number of network nodes, database size, transaction volume, and so on.
Interoperability
If the system is expected to share data or resources with other software or hardware, the architecture should describe how that will be accomplished.
Internationalization / Localization
“Internationalization” is the technical activity of preparing a program to support multiple locales.
Internationalization is often known as “I18N” because the first and last characters in “internationalization” are “I” and “N” and because there are 18 letters in the middle of the word.
“Localization” (known as “L10n” for the same reason) is the activity of translating a program to support a specific local language.
If the program is to be used commercially, the architecture should show that the typical string and character-set issues have been considered, including character set used (ASCII, DBCS, EBCDIC, MBCS, Uni- code, ISO 8859, and so on), kinds of strings used (C strings, Visual Basic Strings, and so on) maintaining the strings without changing code, and translating the strings into foreign languages with minimal impact on the code and the user interface.
The architecture can decide to use strings in line in the code where they’re needed, keep the strings in a class and reference them through the class interface, or store the strings in a resource file.
Error Processing
Error handling is often treated as a coding-convention–level issue, if it’s treated at all.
But because it has system-wide implications, it is best treated at the archi- tectural level. Here are some questions to consider:
Is error processing corrective or merely detective?
If corrective, the program can attempt to recover from errors.
If it’s merely detective, the program can continue processing as if nothing had happened, or it can quit.
In either case, it should notify the user that it detected an error.Is error detection active or passive?
The system can actively anticipate errors—for example, by checking user input for validity—or it can passively respond to them only when it can’t avoid them—for example, when a combination of user input produces a numeric overflow.
It can clear the way or clean up the mess. Again, in either case, the choice has user-interface implications.How does the program propagate errors?
Once it detects an error, it can immediately discard the data that caused the error, it can treat the error as an error and enter an error-processing state, or it can wait until all processing is complete and notify the user that errors were detected (somewhere).What are the conventions for handling error messages?
If the architecture doesn’t specify a single, consistent strategy, the user interface will appear to be a confusing macaroni-and-dried-bean collage of different interfaces in different parts of the program.
To avoid such an appearance, the architecture should establish conventions for error messages.Inside the program, at what level are errors handled?
You can handle them at the point of detection, pass them off to an error-handling class, or pass them up the call chain.What is the level of responsibility of each class for validating its input data?
Is each class responsible for validating its own data, or is there a group of classes responsible for validating the system’s data?
Can classes at any level assume that the data they’re receiving is clean?Do you want to use your environment’s built-in exception handling mechanism, or build your own?
The fact that an environment has a particular error-handling approach doesn’t mean that it’s the best approach for your requirements.
Fault Tolerance
Fault tolerance is a collection of techniques that increase a system’s reliability by detecting errors, recovering from them if possible, and containing their bad effects if not.
For example, a system could make the computation of the square root of a number fault tolerant in any of several ways:
The system might back up and try again when it detects a fault.
If the first answer is wrong, it would back up to a point at which it knew everything was all right and continue from there.The system might have auxiliary code to use if it detects a fault in the primary code.
In the example, if the first answer appears to be wrong, the system switches over to an alternative square-root routine and uses it instead.The system might use a voting algorithm.
It might have three square-root classes that each use a different method.
Each class computes the square root, and then the system compares the results.
Depending on the kind of fault tolerance built into the system, it then uses the mean, the median, or the mode of the three results.The system might replace the erroneous value with a phony value that it knows to have a benign effect on the rest of the system.
Other fault-tolerance approaches include having the system change to a state of partial operation or a state of degraded functionality when it detects an error.
It can shut itself down or automatically restart itself.
Architectural Feasibility
The architecture should demonstrate that the system is technically feasible.
Overengineering
Robustness is the ability of a system to continue to run after it detects an error.
In software, the chain isn’t as strong as its weakest link; it’s as weak as all the weak links multiplied together.
The architecture should clearly indicate whether programmers should err on the side of overengineering or on the side of doing the simplest thing that works.
Buy-vs.-Build Decisions
If the architecture isn’t using off-the-shelf components, it should explain the ways in which it expects custom-built components to surpass ready-made libraries and components.
Reuse Decisions
Change Strategy
The architecture should clearly describe a strategy for handling changes.
The architecture should show that possible enhancements have been considered and that the enhancements most likely are also the easiest to implement.
The architecture’s plan for changes can be as simple as one to put version numbers in data files, reserve fields for future use, or design files so that you can add new tables.
It might specify that data for the table is to be kept in an external file rather than coded inside the program, thus allowing changes in the program without recompiling.
General Architectural Quality
The architecture should describe the motivations for all major decisions.
Be wary of “we’ve always done it that way” justifications.
The architecture should tread the line between under-specifying and over- specifying the system. No part of the architecture should receive more attention than it deserves, or be over-designed.
Designers shouldn’t pay attention to one part at the expense of another.
The architecture should address all requirements without gold-plating (without containing elements that are not required).
The architecture should explicitly identify risky areas.
It should explain why they’re risky and what steps have been taken to minimize the risk.
Finally, you shouldn’t be uneasy about any parts of the architecture.
It shouldn’t contain anything just to please the boss. It shouldn’t contain anything that’s hard for you to understand.
Checklist: Architecture
Specific Architectural Topics
Is the overall organization of the program clear, including a good architectural overview and justification?
Are major building blocks well defined, including their areas of responsibility and their interfaces to other building blocks?
Are all the functions listed in the requirements covered sensibly, by neither too many nor too few building blocks?
Are the most critical classes described and justified?
Is the data design described and justified?
Is the database organization and content specified?
Are all key business rules identified and their impact on the system described?
Is a strategy for the user interface design described?
Is the user interface modularized so that changes in it won’t affect the rest of the program?
Is a strategy for handling I/O described and justified?
Are resource-use estimates and a strategy for resource management described and justified?
Are the architecture’s security requirements described?
Does the architecture set space and speed budgets for each class, subsystem, or functionality area?
Does the architecture describe how scalability will be achieved? Does the architecture address interoperability?
Is a strategy for internationalization/localization described?
Is a coherent error-handling strategy provided?
Is the approach to fault tolerance defined (if any is needed)?
Has technical feasibility of all parts of the system been established?
Is an approach to overengineering specified?
Are necessary buy-vs.-build decisions included?
Does the architecture describe how reused code will be made to conform to other architectural objectives?
Is the architecture designed to accommodate likely changes?
Does the architecture describe how reused code will be made to conform to other architectural objectives?
General Architectural Quality
Does the architecture account for all the requirements?
Is any part over- or under-architected? Are expectations in this area set out explicitly?
Does the whole architecture hang together conceptually?
Is the top-level design independent of the machine and language that will be used to implement it?
Are the motivations for all major decisions provided?
Are you, as a programmer who will implement the system, comfortable with the architecture?
Amount of Time to Spend on Upstream Prerequisites
Generally, a well-run project devotes about 10 to 20 percent of its effort and about 20 to 30 percent of its schedule to requirements, architecture, and up-front planning.
These figures don’t include time for detailed design— that’s part of construction.
If requirements are unstable and you’re working on a large, formal project, you’ll probably have to work with a requirements analyst to resolve requirements problems that are identified early in construction.
Allow time to consult with the requirements analyst and for the requirements analyst to revise the requirements before you’ll have a workable version of the requirements.
If requirements are unstable and you’re working on a small, informal project, allow time for defining the requirements well enough that their volatility will have a minimal impact on construction.
If the requirements are unstable on any project—formal or informal—treat requirements work as its own project.
Estimate the time for the rest of the project after you’ve finished the requirements.
The clients you work with might not immediately understand why you want to plan requirements development as a separate project.
You might need to explain your reasoning to them.
Checklist: Upstream Prerequisites
Have you identified the kind of software project you’re working on and tailored your approach appropriately?
Are the requirements sufficiently well-defined and stable enough to begin construction (see the requirements checklist for details)?
Is the architecture sufficiently well defined to begin construction (see the architecture checklist for details)?
Have other risks unique to your particular project been addressed, such that construction is not exposed to more risk than necessary?
Choice of Programming Language
Programmers working with high-level languages achieve better productivity and quality than those working with lower-level languages.
Languages such as C++, Java, Smalltalk, and Visual Basic have been credited with improving productivity, reliability, simplicity, and comprehensibility by factors of 5 to 15 over low-level languages such as assembly and C (Brooks 1987, Jones 1998, Boehm 2000).
You save time when you don’t need to have an awards ceremony every time a C statement does what it’s supposed to.
Moreover, higher-level languages are more expressive than lower-level languages. Each line of code says more.

Programmers may be similarly influenced by their languages.
The words available in a programming language for expressing your programming thoughts certainly determine how you express your thoughts and might even determine what thoughts you can express.
Evidence of the effect of programming languages on programmers’ thinking is common.
If your language lacks constructs that you want to use or is prone to other kinds of problems, try to compensate for them. Invent your own coding conventions, standards, class libraries, and other augmentations.
Checklist: Major Construction Practices
Coding
Have you defined coding conventions for names, comments, and formatting?
Have you defined specific coding practices that are implied by the architecture, such as how error conditions will be handled, how security will be addressed, and so on?
Have you identified your location on the technology wave and adjusted your approach to match? If necessary, have you identified how you will program into the language rather than being limited by programming in it?
Teamwork
Have you defined an integration procedure, that is, have you defined the specific steps a programmer must go through before checking code into the master sources?
Will programmers program in pairs, or individually, or some combination of the two?
Quality Assurance
Will programmers write test cases for their code before writing the code itself?
Will programmers write unit tests for the their code regardless of whether they write them first or last?
Will programmers step through their code in the debugger before they check it in?
Will programmers integration-test their code before they check it in?
Will programmers review or inspect each others’ code?
Tools
Have you selected a revision control tool?
Have you selected a language and language version or compiler version?
Have you decided whether to allow use of non-standard language features?
Have you identified and acquired other tools you’ll be using—editor, refactoring tool, debugger, test framework, syntax checker, and so on?
- Every programming language has strengths and weaknesses. Be aware of the specific strengths and weaknesses of the language you’re using.
- Establish programming conventions before you begin programming. It’s nearly impossible to change code to match them later.
- More construction practices exist than you can use on any single project. Consciously choose the practices that are best suited to your project.
- Your position on the technology wave determines what approaches will be effective—or even possible. Identify where you are on the technology wave, and adjust your plans and expectations accordingly.
Design in Construction
Design is a Wicked Problem.
Horst Rittel and Melvin Webber defined a “wicked” problem as one that could be clearly defined only by solving it, or by solving part of it (1973).
This paradox implies, essentially, that you have to “solve” the problem once in order to clearly define it and then solve it again to create a solution that works.
Design is sloppy because you take many false steps and go down many blind alleys—you make a lot of mistakes.
Indeed, making mistakes is the point of design—it’s cheaper to make mistakes and correct designs that it would be to make the same mistakes, recognize them later, and have to correct full-blown code.
Design is a Heuristic Process.
Because design is non-deterministic, design techniques tend to be “heuristics”— ”rules of thumb” or “things to try that sometimes work,” rather than repeatable processes that are guaranteed to produce predictable results. Design involves trial and error.
A tidy way of summarizing these attributes of design is to say that design is “emergent” (Bain and Shalloway 2004).
Designs don’t spring fully formed directly from someone’s brain.
They evolve and improve through design reviews, informal discussions, experience writing the code itself, and experience revising the code itself.
Accidental and Essential Difficulties
Brooks argues that software development is made difficult because of two different classes of problems—the essential and the accidental.
In philosophy, the essential properties are the properties that a thing must have in order to be that thing.
A car must have an engine, wheels, and doors to be a car. If it doesn’t have any of those essential properties, then it isn’t really a car.
Accidental properties are the properties a thing just happens to have, that don’t really bear on whether the thing is really that kind of thing.
A car could have a V8, a turbocharged 4-cylinder, or some other kind of engine and be a car regardless of that detail.
You could also think of accidental properties as coincidental, discretionary, optional, and happenstance.
Importance of Managing Complexity
When projects do fail for reasons that are primarily technical, the reason is often uncontrolled complexity.
The software is allowed to grow so complex that no one really knows what it does.
When a project reaches the point at which no one really understands the impact that code changes in one area will have on other areas, progress grinds to a halt.
We should try to organize our programs in such a way that we can safely focus on one part of it at a time.
The goal is to minimize the amount of a program you have to think about at any one time.
Desirable Characteristics of a Design
There are three sources of overly costly, ineffective designs:
This suggests a two-prong approach to managing complexity:
- Minimize the amount of essential complexity that anyone’s brain has to deal with at any one time.
- Keep accidental complexity from needlessly proliferating.
Here’s a list of internal design characteristics:
Minimal complexity
The primary goal of design should be to minimize complexity for all the reasons described in the last section. Avoid making “clever” designs.
Clever designs are usually hard to understand. Instead make “simple” and “easy-to-understand” designs.
Ease of maintenance
Ease of maintenance means designing for the maintenance programmer.
Continually imagine the questions a maintenance programmer would ask about the code you’re writing.
Design the system to be self-explanatory.
Minimal connectedness
Use the principles of strong cohesion, loose coupling, and information hiding to design classes with as few interconnections as possible.
Minimal connectedness minimizes work during integration, testing, and maintenance.
Extensibility
Reusability
High fan-in
High fan-in refers to having a high number of classes that use a given class.
High fan-in implies that a system has been designed to make good use of utility classes at the lower levels in the system.
Low-to-medium fan-out
Low-to-medium fan-out means having a given class use a low-to-medium number of other classes.
High fan-out (more than about seven) indicates that a class uses a large number of other classes and may therefore be overly complex.
Researchers have found that the principle of low fan out is beneficial whether you’re considering the number of routines called from within a routine or from within a class.
Portability
Portability means designing the system so that you can easily move it to another environment.
Leanness
Leanness means designing the system so that it has no extra parts.
Voltaire said that a book is finished not when nothing more can be added but when nothing more can be taken away.
Stratification
Stratified design means trying to keep the levels of decomposition stratified so that you can view the system at any single level and get a consistent view.
Design the system so that you can view it at one level without dipping into other levels.
If you’re writing a modern system that has to use a lot of older, poorly designed code, write a layer of the new system that’s responsible for interfacing with the old code.
The beneficial effects of stratified design in such a case are (1) it compartmentalizes the messiness of the bad code and (2) if you’re ever allowed to jettison the old code, you won’t need to modify any new code except the interface layer.
Standard techniques
Try to give the whole system a familiar feeling by using standardized, common approaches.
Levels of Design

Level 1: Software System
The first level is the entire system. Some programmers jump right from the system level into designing classes, but it’s usually beneficial to think through higher level combinations of classes, such as subsystems or packages.
Level 2: Division into Subsystems or Packages
The main product of design at this level is the identification of all major subsystems.
The subsystems can be big—database, user interface, business logic, command interpreter, report engine, and so on.
The major design activity at this level is deciding how to partition the program into major subsystems and defining how each subsystem is allowed to use each other subsystems.
Of particular importance at this level are the rules about how the various subsystems can communicate.
If all subsystems can communicate with all other subsystems, you lose the benefit of separating them at all. Make the subsystem meaningful by restricting communications.
On large programs and families of programs, design at the subsystem level makes a difference.
If you believe that your program is small enough to skip subsystem-level design, at least make the decision to skip that level of design a conscious one.
Common Subsystems
- Business logic
Business logic is the laws, regulations, policies, and procedures that you encode into a computer system.
If you’re writing a payroll system, you might encode rules from the IRS about the number of allowable withholdings and the estimated tax rate.
- User interface
Create a subsystem to isolate user-interface components so that the user interface can evolve without damaging the rest of the program. I
n most cases, a user- interface subsystem uses several subordinate subsystems or classes for GUI interface, command line interface, menu operations, window management, help system, and so forth.
- Database access
Subsystems that hide implementation details provide a valuable level of abstraction that reduces a program’s complexity.
You can hide the implementation details of accessing a database so that most of the program doesn’t need to worry about the messy details of manipulating low- level structures and can deal with the data in terms of how it’s used at the business-problem level.
- System dependencies
Level 3: Division into Classes
Design at this level includes identifying all classes in the system.
For example, a database-interface subsystem might be further partitioned into data access classes and persistence framework classes as well as database meta data.
Level 4: Division into Routines
The class interface defined at Level 3 will define some of the routines.
Design at Level 4 will detail the class’s private routines.
When you examine the details of the routines inside a class, you can see that many routines are simple boxes, but a few are composed of hierarchically organized routines, which require still more design.
Level 5: Internal Routine Design
Design at the routine level consists of laying out the detailed functionality of the individual routines.
Internal routine design is typically left to the individual programmer working on an individual routine.
The design consists of activities such as writing pseudocode, looking up algorithms in reference books, deciding how to organize the paragraphs of code in a routine, and writing programming- language code.
Find Real-World Objects
The steps in designing with objects are
• Identify the objects and their attributes (methods and data).
• Determine what can be done to each object.
• Determine what each object can do to other objects.
• Determine the parts of each object that will be visible to other objects— which parts will be public and which will be private.
• Define each object’s public interface.
Form Consistent Abstractions
Abstraction is the ability to engage with a concept while safely ignoring some of its details— handling different details at different levels.
Base classes are abstractions that allow you to focus on common attributes of a set of derived classes and ignore the details of the specific classes while you’re working on the base class.
A good class interface is an abstraction that allows you to focus on the interface without needing to worry about the internal workings of the class.
Encapsulate Implementation Details
Abstraction says, “You’re allowed to look at an object at a high level of detail.”
Encapsulation says, “Furthermore, you aren’t allowed to look at an object at any other level of detail.”
Inherit When Inheritance Simplifies the Design
Hide Secrets (Information Hiding)
In structured design, the notion of “black boxes” comes from information hiding.
In object-oriented design, it gives rise to the concepts of encapsulation and modularity, and it is associated with the concept of abstraction.
Secrets and the Right to Privacy
One key task in designing a class is deciding which features should be known outside the class and which should remain secret.
Designing the class interface is an iterative process just like any other aspect of design.
If you don’t get the interface right the first time, try a few more times until it stabilizes. If it doesn’t stabilize, you need to try a different approach.
Information hiding is useful at all levels of design, from the use of named constants instead of literals, to creation of data types, to class design, routine design, and subsystem design.
Excessive Distribution Of Information
One common barrier to information hiding is an excessive distribution of information throughout a system. You might have hard-coded the literal 100 throughout a system. Using 100 as a literal decentralizes references to it. It’s better to hide the information in one place, in a constant MAX_EMPLOYEES perhaps, whose value is changed in only one place.
Another example of excessive information distribution is interleaving interaction with human users throughout a system. If the mode of interaction changes—say, from a GUI interface to a command-line interface—virtually all the code will have to be modified. It’s better to concentrate user interaction in a single class, package, or subsystem you can change without affecting the whole system.
Yet another example would be a global data element—perhaps an array of employee data with 1000 elements maximum that’s accessed throughout a program. If the program uses the global data directly, information about the data item’s implementation—such as the fact that it’s an array and has a maximum of 1000 elements—will be spread throughout the program. If the program uses the data only through access routines, only the access routines will know the implementation details.
Circular Dependencies
A more subtle barrier to information hiding is circular dependencies, as when a routine in class A calls a routine in class B, and a routine in class B calls a routine in class A.
Avoid such dependency loops.
Class Data Mistaken For Global Data
Global data is generally subject to two problems: (1) Routines operate on global data without knowing that other routines are operating on it; and (2) routines are aware that other routines are operating on the global data, but they don’t know exactly what they’re doing to it.
Perceived Performance Penalties
A final barrier to information hiding can be an attempt to avoid performance penalties at both the architectural and the coding levels. You don’t need to worry at either level.
Value of Information Hiding
The difference between object-oriented design and information hiding in this example is more subtle than a clash of explicit rules and regulations.
Object- oriented design would approve of this design decision as much as information hiding would.
Rather, the difference is one of heuristics—thinking about information hiding inspires and promotes design decisions that thinking about objects does not.
Identify Areas Likely to Change
- Identify items that seem likely to change.
- Separate items that are likely to change.
- Isolate items that seem likely to change.
Here are a few areas that are likely to change:
- Business logic
- Hardware dependencies
- Input and output
- Nonstandard language features
- Difficult design and construction areas
It’s a good idea to hide difficult design and construction areas because they might be done poorly and you might need to do them again.
- Status variables
You can add at least two levels of flexibility and readability to your use of status variables:
Don’t use a boolean variable as a status variable. Use an enumerated type instead.
It’s common to add a new state to a status variable, and adding a new type to an enumerated type requires a mere recompilation rather than a major revision of every line of code that checks the variable.
Use access routines rather than checking the variable directly.
By checking the access routine rather than the variable, you allow for the possibility of more sophisticated state detection.
For example, if you wanted to check combinations of an error-state variable and a current-function-state variable, it would be easy to do if the test were hidden in a routine and hard to do if it were a complicated test hard-coded throughout the program.
- Data-size constraints
Anticipating Different Degrees of Change
A good technique for identifying areas likely to change is first to identify the minimal subset of the program that might be of use to the user.
The subset makes up the core of the system and is unlikely to change.
Next, define minimal increments to the system. They can be so small that they seem trivial.
These areas of potential improvement constitute potential changes to the system; design these areas using the principles of information hiding.
Keep Coupling Loose
Coupling describes how tightly a class or routine is related to other classes or routines.
The goal is to create classes and routines with small, direct, visible, and flexible relations to other classes and routines (loose coupling).
Coupling Criteria
- Size
Size refers to the number of connections between modules.
With coupling, small is beautiful because it’s less work to connect other modules to a module that has a smaller interface.
- Visibility
Visibility refers to the prominence of the connection between two modules.
You get lots of credit for making your connections as blatant as possible.
Passing data in a parameter list is making an obvious connection and is therefore good.
Modifying global data so that another module can use that data is a sneaky connection and is therefore bad.
Documenting the global-data connection makes it more obvious and is slightly better.
- Flexibility
Flexibility refers to how easily you can change the connections between modules.
Coupling Criteria
- Simple-data-parameter coupling
Two modules are simple-data-parameter coupled if all the data passed between them are of primitive data types and all the data is passed through parameter lists.
This kind of coupling is normal and acceptable.
- Simple-object coupling
A module is simple-object coupled to an object if it instantiates that object.
This kind of coupling is fine.
- Object-parameter coupling
Two modules are object-parameter coupled to each other if Object1 requires Object2 to pass it an Object3.
This kind of coupling is tighter than Object1 requiring Object2 to pass it only primitive data types.
- Semantic coupling
Here are some examples:
- Module1 passes a control flag to Module2 that tells Module2 what to do. This approach requires Module1 to make assumptions about the internal workings of Module2, namely, what Module2 is going to with the control flag. If Module2 defines a specific data type for the control flag (enumerated type or object), this usage is probably OK.
- Module2 uses global data after the global data has been modified by Module1. This approach requires Module2 to assume that Module1 has modified the data in the ways Module2 needs it to be modified, and that Module1 has been called at the right time.
- Module1’s interface states that its Module1.Initialize() routine should be called before its Module1.Routine1() is called. Module2 knows that Module1.Routine1() calls Module1.Initialize() anyway, so it just instantiates Module1 and calls Module1.Routine1() without calling Module1.Initialize() first.
- Module1 passes Object to Module2. Because Module1 knows that Module2 uses only three of Object’s seven methods, it only initializes Object only partially—with the specific data those three methods need.
- Module1 passes BaseObject to Module2. Because Module2 knows that Module2 is really passing it DerivedObject, it casts BaseObject to DerivedObject and calls methods that are specific to DerivedObject.
DerivedClass modifies BaseClass’s protected member data directly.
Semantic coupling is dangerous because changing code in the used module can break code in the using module in ways that are completely undetectable by the compiler.
The point of loose coupling is that an effective module provides an additional level of abstraction—once you write it, you can take it for granted.
It reduces overall program complexity and allows you to focus on one thing at a time.
Look for Common Design Patterns
Common patterns include Adapter, Bridge, Decorator, Facade, Factory Method, Observor, Singleton, Strategy, and Template Method.


Other Heuristics:
- Aim for Strong Cohesion -> 一个class内有强凝聚, class之间coupling
- Build Hierarchies
Hierarchies are a useful tool for achieving Software’s Primary Technical Imperative because they allow you to focus on only the level of detail you’re currently concerned with.
The details don’t go away completely; they’re simply pushed to another level so that you can think about them when you want to rather than thinking about all the details all of the time.
- Formalize Class Contracts
Typically, the contract is something like “If you promise to provide data x, y, and z and you promise they’ll have characteristics a, b, and c, I promise to perform operations 1, 2, and 3 within constraints 8, 9, and 10.”
The promises the clients of the class make to the class are typically called “preconditions,” and the promises the object makes to its clients are called the “postconditions.”
- Assign Responsibilities
- Design for Test
- Avoid Failure
The high-profile security lapses of various well-known systems the past few years make it hard to disagree that we should find ways to apply Petroski’s design-failure insights to software.
- Choose Binding Time Consciously
- Make Central Points of Control
Control can be centralized in classes, routines, preprocessor macros, #include files—even a named constant is an example of a central point of control.
- Consider Using Brute Force
A brute-force solution that works is better than an elegant solution that doesn’t work.
- Draw a Diagram
- Keep Your Design Modular
Guidelines for Using Heuristics
- Understanding the Problem.
- Devising a Plan.
- Carrying out the Plan.
- Looking Back.
One of the most effective guidelines is not to get stuck on a single approach.
If diagramming the design in UML isn’t working, write it in English.
Write a short test program.
Try a completely different approach.
Think of a brute-force solution.
Keep outlining and sketching with your pencil, and your brain will follow.
If all else fails, walk away from the problem.
Literally go for a walk, or think about something else before returning to the problem.
If you’ve given it your best and are getting nowhere, putting it out of your mind for a time often produces results more quickly than sheer persistence can.
Top-Down and Bottom-Up Design Approaches
Top-down design begins at a high level of abstraction.
You define base classes or other non-specific design elements.
As you develop the design, you increase the level of detail, identifying derived classes, collaborating classes, and other detailed design elements.
Bottom-up design starts with specifics and works toward generalities It typically begins by identifying concrete objects and then generalizes aggregations of objects and base classes from those specifics.
How far do you decompose a program? Continue decomposing until it seems as if it would be easier to code the next level than to decompose it.
Work until you become somewhat impatient at how obvious and easy the design seems.
If you need to work with something more tangible, try the bottom-up design approach.
Ask yourself, “What do I know this system needs to do?”
You might identify a few low-level responsibilities that you can assign to concrete classes
Ask yourself what you know the system needs to do.
Identify concrete objects and responsibilities from that question.
Identify common objects and group them using subsystem organization, packages, composition within objects, or inheritance, whichever is appropriate.
Continue with the next level up, or go back to the top and try again to work down.
To summarize, top down tends to start simple, but sometimes low-level complexity ripples back to the top, and those ripples can make things more complex than they really needed to be.
Bottom up tends to start complex, but identifying that complexity early on leads to better design of the higher-level classes—if the complexity doesn’t torpedo the whole system first!
Experimental Prototyping
Prototyping works poorly when developers aren’t disciplined about writing the absolute minimum of code needed to answer a question.
Prototyping also works poorly when the design question is not specific enough.
A final risk of prototyping arises when developers do not treat the code as throwaway code.
By adopting the attitude that once the question is answered the code will be thrown away, you can minimize this risk.

Capturing Your Design Work
- Insert design documentation into the code itself
- Capture design discussions and decisions on a Wiki
- Write email summaries
- Use a digital camera
- Save design flipcharts
- Use CRC cards
- Create UML diagrams at appropriate levels of detail
CHECKLIST: Design in Construction
Have you iterated, selecting the best of several attempts rather than the first attempt?
Have you tried decomposing the system in several different ways to see which way will work best?
Have you approached the design problem both from the top down and from the bottom up?
Have you prototyped risky or unfamiliar parts of the system, creating the absolute minimum amount of throwaway code needed to answer specific questions?
Has you design been reviewed, formally or informally, by others?
Have you driven the design to the point that its implementation seems obvious?
Have you captured your design work using an appropriate technique such as a Wiki, email, flipcharts, digital camera, UML, CRC cards, or comments in the code itself?
Does the design adequately address issues that were identified and deferred at the architectural level?
Is the design stratified into layers?
Are you satisfied with the way the program has been decomposed into subsystems, packages, and classes?
Are you satisfied with the way the classes have been decomposed into routines?
Are classes designed for minimal interaction with each other?
Are classes and subsystems designed so that you can use them in other systems?
Will the program be easy to maintain?
Is the design lean? Are all of its parts strictly necessary?
Does the design use standard techniques and avoid exotic, hard-to- understand elements?
Overall, does the design help minimize both accidental and essential complexity?
Working Classes
Good Class Interfaces
The first and probably most important step in creating a high quality class is creating a good interface.
Good Abstraction
Present a consistent level of abstraction in the class interface.
Each class should implement one and only one abstract data types (ADTs).

Move unrelated information to another class.
Beware of erosion of the interface’s abstraction under modification.
Don’t add public members that are inconsistent with the interface abstraction.
Consider abstraction and cohesion together.
Good Encapsulation
Minimize accessibility of classes and members.
Don’t expose member data in public.
Don’t put private implementation details in a class’s interface.
Don’t make assumptions about the class’s users.
Avoid friend classes.
In a few circumstances such as the State pattern, friend classes can be used in a disciplined way that contributes to managing complexity
Don’t put a routine into the public interface just because it uses only public routines.
Favor read-time convenience to write-time convenience.
Be very, very wary of semantic violations of encapsulation.
Here are some examples of the ways that a user of a class can break encapsulation semantically:
- Not calling Class A’s Initialize() routine because you know that Class A’s PerformFirstOperation() routine calls it automatically.
- Not calling the database.Connect() routine before you call employee.Retrieve( database ) because you know that the employee.Retrieve() function will connect to the database if there isn’t already a connection.
- Not calling Class A’s Terminate() routine because you know that Class A’s PerformFinalOperation() routine has already called it.
- Using a pointer or reference to ObjectB created by ObjectA even after ObjectA has gone out of scope, because you know that ObjectA keeps ObjectB in static storage, and ObjectB will still be valid.
- Using ClassB’s MAXIMUM_ELEMENTS constant instead of using ClassA.MAXIMUM_ELEMENTS, because you know that they’re both equal to the same value.
Watch for coupling that’s too tight.
Design and Implementation Issues
Containment (“has a” relationships)
Implement “has a” through private inheritance as a last resort.
In some instances you might find that you can’t achieve containment through making one object a member of another. In that case, some experts suggest privately inheriting from the contained object.
Be critical of classes that contain more than about seven members.
Inheritance (“is a” relationships)
When you decide to use inheritance, you have to make several decisions:
- For each member routine, will the routine be visible to derived classes? Will it have a default implementation? Will the default implementation be overridable?
- For each data member (including variables, named constants, enumerations, and so on), will the data member be visible to derived class?
Implement “is a” through public inheritance.
If the derived class isn’t going to adhere completely to the same interface contract defined by the base class, inheritance is not the right implementation technique.
Design and document for inheritance or prohibit it.
Adhere to the Liskov Substitution Principle.
Be sure to inherit only what you want to inherit.

As the table suggests, inherited routines come in three basic flavors:
- An abstract overridable routine means that the derived class inherits the routine’s interface but not its implementation.
- An overridable routine means that the derived class inherits the routine’s interface and a default implementation, and it is allowed to override the default implementation.
- A non-overridable routine means that the derived class inherits the routine’s interface and its default implementation, and it is not allowed to override the routine’s implementation.
Don’t “override” a non-overridable member function.
If a function is private in the base class, a derived class can create a function with the same name. To the programmer reading the code in the derived class, such a function can create confusion because it looks like it should by polymorphic, but it isn’t; it just has the same name.
Move common interfaces, data, and behavior as high as possible in the inheritance tree.
Be suspicious of classes of which there is only one instance.
Be suspicious of base classes of which there is only one derived class.
Be suspicious of classes that override a routine and do nothing inside the derived routine.
Avoid deep inheritance trees.
In his excellent book Object-Oriented Design Heuristics, Arthur Riel suggests limiting inheritance hierarchies to a maximum of six levels.
In my experience most people have trouble juggling more than two or three levels of inheritance in their brains at once.
Prefer inheritance to extensive type checking.

Avoid using a base class’s protected data in a derived class (or make that data private instead of protected in the first place).
Member Functions and Data
Keep the number of routines in a class as small as possible.
However, other competing factors were found to be more significant, including deep inheritance trees, large number of routines called by a routine, and strong coupling between classes.
Evaluate the tradeoff between minimizing the number of routines and these other factors.
Disallow implicitly generated member functions and operators you don’t want.
Minimize direct routine calls to other classes.
Minimize indirect routine calls to other classes.
Direct connections are hazardous enough. Indirect connections—such as account.ContactPerson().DaytimeContactInfo().PhoneNumber()—tend to be even more hazardous.
In general, minimize the extent to which a class collaborates with other classes
Try to minimize all of the following:
- Number of kinds of objects instantiated
- Number of different direct routine calls on instantiated objects
- Number of routine calls on objects returned by other instantiated objects
Constructors
Initialize all member data in all constructors, if possible.
Initialize data members in the order in which they’re declared.
Enforce the singleton property by using a private constructor.
Enforce the singleton property by using all static member data and reference counting.
Prefer deep copies to shallow copies until proven otherwise.
Martin Fowler’s Refactoring (1999) describes the specific steps needed to convert from shallow copies to deep copies and from deep copies to shallow copies.
(Fowler calls them reference objects and value objects.)
Reasons to Create a Class
Model real-world objects.
Model abstract objects.
Reduce complexity.
Isolate complexity.
Hide implementation details.
Limit effects of changes.
Hide global data.
Streamline parameter passing.
Make central points of control.
Facilitate reusable code.
Plan for a family of programs.
If you expect a program to be modified, it’s a good idea to isolate the parts that you expect to change by putting them into their own classes.
You can then modify the classes without affecting the rest of the program, or you can put in completely new classes instead.
Package related operations.
To accomplish a specific refactoring.
Classes to Avoid
Avoid creating god classes.
Eliminate irrelevant classes.
Avoid classes named after verbs.
Language-Specific Issues
Here are some of the class-related areas that vary significantly depending on the language:
Behavior of overridden constructors and destructors in an inheritance tree
Behavior of constructors and destructors under exception-handling conditions
Importance of default constructors (constructors with no arguments) Time at which a destructor or finalizer is called
Wisdom of overriding the language’s built-in operators, including assignment and equality
How memory is handled as objects are created and destroyed, or as they are declared and go out of scope
CHECKLIST: Class Quality
Abstract Data Types
- Have you thought of the classes in your program as Abstract Data Types and evaluated their interfaces from that point of view?
Abstraction
- Does the class have a central purpose?
- Is the class well named, and does its name describe its central purpose?
- Does the class’s interface present a consistent abstraction?
- Does the class’s interface make obvious how you should use the class?
- Is the class’s interface abstract enough that you don’t have to think about how its services are implemented? Can you treat the class as a black box?
- Are the class’s services complete enough that other classes don’t have to meddle with its internal data?
- Has unrelated information been moved out of the class?
- Have you thought about subdividing the class into component classes, and have you subdivided it as much as you can?
- Are you preserving the integrity of the class’s interface as you modify the class?
Encapsulation
- Does the class minimize accessibility to its members?
- Does the class avoid exposing member data?
- Does the class hide its implementation details from other classes as much as the programming language permits?
- Does the class avoid making assumptions about its users, including its derived classes?
- Is the class independent of other classes? Is it loosely coupled?
Inheritance
- Is inheritance used only to model “is a” relationships?
- Does the class documentation describe the inheritance strategy?
- Do derived classes adhere to the Liskov Substitution Principle?
- Do derived classes avoid “overriding” non overridable routines?
- Are common interfaces, data, and behavior as high as possible in the inheritance tree?
- Are inheritance trees fairly shallow?
- Are all data members in the base class private rather than protected?
Other Implementation Issues
- Does the class contain about seven data members or fewer?
- Does the class minimize direct and indirect routine calls to other classes?
- Does the class collaborate with other classes only to the extent absolutely necessary?
- Is all member data initialized in the constructor?
- Is the class designed to be used as deep copies rather than shallow copies
unless there’s a measured reason to create shallow copies?
Language-Specific Issues
- Have you investigated the language-specific issues for classes in your specific programming language?
High-Quality Routines
A routine is an individual method or procedure invocable for a single purpose.
Low-Quality Routine:
- The routine has a bad name.
- The routine isn’t documented(Self-Documenting Code).
- The routine has a bad layout.
- The routine’s input variable, inputRec, is changed.
- The routine reads and writes global variables.
- The routine doesn’t have a single purpose.
- The routine doesn’t defend itself against bad data.
- The routine uses several magic numbers.
- The routine uses only two fields of the CORP_DATA type of parameter.
- Some of the routine’s parameters are unused.
- One of the routine’s parameters is mislabeled.
- The routine has too many parameters.
- The routine’s parameters are poorly ordered and are not documented.
Valid Reasons to Create a Routine
Reduce complexity.
Make a section of code readable.
Avoid duplicate code.
Hide sequences.
Hide pointer operations.
Improve portability.
Simplify complicated boolean tests.
Improve performance.
Design at the Routine Level
Functional cohesion is the strongest and best kind of cohesion, occurring when a routine performs one and only one operation.
Examples of highly cohesive routines include sin(), GetCustomerName(), EraseFile(), CalculateLoanPayment(), and AgeFromBirthday().
Of course, this evaluation of their cohesion assumes that the routines do what their names say they do—if they do anything else, they are less cohesive and poorly named.
Several other kinds of cohesion are normally considered to be less than ideal:
Sequential cohesion exists when a routine contains operations that must be performed in a specific order, that share data from step to step, and that don’t make up a complete function when done together.
An example of sequential cohesion is a routine that calculates an employee’s age and time to retirement, given a birth date.
If the routine calculates the age and then uses that result to calculate the employee’s time to retirement, it has sequential cohesion.
Communicational cohesion occurs when operations in a routine make use of the same data and aren’t related in any other way.
Temporal cohesion occurs when operations are combined into a routine because they are all done at the same time.
Some programmers consider temporal cohesion to be unacceptable because it’s sometimes associated with bad programming practices such as having a hodgepodge of code in a Startup() routine.
To avoid this problem, think of temporal routines as organizers of other events.
If a routine has bad cohesion, it’s better to put effort into a rewrite to have better cohesion than investing in a pinpoint diagnosis of the problem.
The unacceptable kinds of cohesion:
Procedural cohesion occurs when operations in a routine are done in a specified order.
The order of these operations is important only because it matches the order in which the user is asked for the data on the input screen.
Logical cohesion occurs when several operations are stuffed into the same routine and one of the operations is selected by a control flag that’s passed in.
It’s called logical cohesion because the control flow or “logic” of the routine is the only thing that ties the operations together—they’re all in a big if statement or case statement together.
Coincidental cohesion occurs when the operations in a routine have no discernible relationship to each other.
Good Routine Names
Describe everything the routine does.
Avoid meaningless or wishy-washy verbs.
Make names of routines as long as necessary.
To name a function, use a description of the return value.
To name a procedure, use a strong verb followed by an object.
Use opposites precisely.
Establish conventions for common operations.
How to Use Routine Parameters
If several routines use similar parameters, put the similar parameters in a consistent order.
Use all the parameters.
Put status or error variables last.
Don’t use routine parameters as working variables.
Document interface assumptions about parameters.
- Whether parameters are input-only, modified, or output-only
- Units of numeric parameters (inches, feet, meters, and so on)
- Meanings of status codes and error values if enumerated types aren’t used
- Ranges of expected values
- Specific values that should never appear
Limit the number of a routine’s parameters to about seven.
Consider an input, modify, and output naming convention for parameters.
You could prefix them with i_, m_, and o_.
If you’re feeling verbose, you could prefix them with Input_, Modify_, and Output_.
Pass the variables or objects that the routine needs to maintain its interface abstraction.
If you find yourself frequently changing the parameter list to the routine, with the parameters coming from the same object each time, that’s an indication that you should be passing the whole object rather than specific elements.
Used named parameters.
Don’t assume anything about the parameter-passing mechanism.
Make sure actual parameters match formal parameters.
Develop the habit of checking types of arguments in parameter lists and heeding compiler warnings about mismatched parameter types.
When to Use a Function and When to Use a Procedure
Purists argue that a function should return only one value, just as a mathematical function does.
This means that a function would take only input parameters and return its only value through the function itself.
A common programming practice is to have a function that operates as a procedure and returns a status value.
if ( report.FormatOutput( formattedReport ) = Success ) then …
Logically, it works as a procedure, but because it returns a value, it’s officially a function.
The use of the return value to indicate the success or failure of the procedure is not confusing if the technique is used consistently.
The alternative is to create a procedure that has a status variable as an explicit parameter, which promotes code like this fragment:
report.FormatOutput( formattedReport, outputStatus )
I prefer the second style of coding, not because I’m hard-nosed about the difference between functions and procedures but because it makes a clear separation between the routine call and the test of the status value.
outputStatus = report.FormatOutput( formattedReport )
if ( outputStatus = Success ) then …
In short, use a function if the primary purpose of the routine is to return the value indicated by the function name. Otherwise, use a procedure.
Setting the Function’s Return Value
Check all possible return paths.
It’s good practice to initialize the return value at the beginning of the function to a default value—which provides a safety net in the event of that the correct return value is not set.
Don’t return references or pointers to local data.
If an object needs to return information about its internal data, it should save the information as class member data.
It should then provide accessor functions that return the values of the member data items rather than references or pointers to local data.
Macro Routines and Inline Routines
Fully parenthesize macro expressions.
1 | #define Cube( a ) a*a*a |
This macro has a problem.
If you pass it nonatomic values for a, it won’t do the multiplication properly.
If you use the expression Cube( x+1 ), it expands to x+1 * x + 1 * x + 1, which, because of the precedence of the multiplication and addition operators, is not what you want.
1 | #define Cube( a ) (a)*(a)*(a) |
This is close, but still no cigar. If you use Cube() in an expression that has operators with higher precedence than multiplication, the (a) * (a) * (a) will be torn apart.
1 | #define Cube( a ) ((a)*(a)*(a)) |
Name macros that expand to code like routines so that they can be replaced by routines if necessary.
Limitations on the Use of Macro Routines
Modern languages like C++ provide numerous alternatives to the use of macros:
- const for declaring constant values
- inline for defining functions that will be compiled as inline code
- template for defining standard operations like min, max, and so on in a type- safe way
- enum for defining enumerated types
- typedef for defining simple type substitutions
CHECKLIST: High-Quality Routines
Big-Picture Issues
- Is the reason for creating the routine sufficient?
- Have all parts of the routine that would benefit from being put into routines of their own been put into routines of their own?
- Is the routine’s name a strong, clear verb-plus-object name for a procedure or a description of the return value for a function?
- Does the routine’s name describe everything the routine does? Have you established naming conventions for common operations?
- Does the routine have strong, functional cohesion—doing one and only one thing and doing it well?
- Do the routines have loose coupling—are the routine’s connections to other routines small, intimate, visible, and flexible?
- Is the length of the routine determined naturally by its function and logic, rather than by an artificial coding standard?
Parameter-Passing Issues
- Does the routine’s parameter list, taken as a whole, present a consistent interface abstraction?
- Are the routine’s parameters in a sensible order, including matching the order of parameters in similar routines?
- Are interface assumptions documented?
- Does the routine have seven or fewer parameters?
- Is each input parameter used?
- Is each output parameter used?
- Does the routine avoid using input parameters as working variables?
- If the routine is a function, does it return a valid value under all possible circumstances?
Defensive Programming
Protecting Your Program From Invalid Inputs
Check the values of all data from external sources.
If you’re working on a secure applica- tion, be especially leery of data that might attack your system: attempted buffer overflows, injected SQL commands, injected html or XML code, integer over- flows, and so on.
Check the values of all routine input parameters.
Decide how to handle bad inputs.
Assertions
An assertion is code that’s used during development—usually a routine or macro—that allows a program to check itself as it runs.
When an assertion is true, that means everything is operating as expected.
When it’s false, that means it has detected an unexpected error in the code.
An assertion usually takes two arguments: a boolean expression that describes the assumption that’s supposed to be true and a message to display if it isn’t.
Guidelines for Using Assertions
Use error handling code for conditions you expect to occur; use assertions for conditions that should never occur.
Error-handling typically checks for bad input data; assertions check for bugs in the code.
A good way to think of assertions is as executable documentation—you can’t rely on them to make the code work, but they can document assumptions more actively than program-language comments can.
Avoid putting executable code in assertions.
Visual Basic Example of a Dangerous Use of an Assertion
1 | Debug.Assert( PerformAction() ) ' Couldn't perform action |
Put executable statements on their own lines, assign the results to status variables, and test the status variables instead.
Visual Basic Example of a Safe Use of an Assertion
1 | actionPerformed = PerformAction() |
Use assertions to document preconditions and postconditions.
Preconditions are the properties that the client code of a routine or class prom- ises will be true before it calls the routine or instantiates the object.
Preconditions are the client code’s obligations to the code it calls.
Postconditions are the properties that the routine or class promises will be true when it concludes executing.
Postconditions are the routine or class’s obligations to the code that uses it.
For highly robust code, assert, and then handle the error anyway.
Error Handling Techniques
Return a neutral value.
Substitute the next piece of valid data.
Return the same answer as the previous time.
Substitute the closest legal value.
Log a warning message to a file.
Return an error code.
Call an error processing routine/object.
Display an error message wherever the error is encountered.
Handle the error in whatever way works best locally.
Shutdown.
Some systems shut down whenever they detect an error. This approach is useful in safety critical applications.
Robustness vs. Correctness
Correctness means never returning an inaccurate result; no result is better than an inaccurate result. Robustness means always trying to do something that will allow the software to keep operating, even if that leads to results that are inaccurate sometimes.
Safety critical applications tend to favor correctness to robustness.
Consumer applications tend to favor robustness to correctness.
Exceptions


Use exceptions to notify other parts of the program about errors that should not be ignored.
Throw an exception only for conditions that are truly exceptional.
in other words, conditions that cannot be addressed by other coding practices.
Don’t use an exception to pass the buck.
Avoid throwing exceptions in constructors and destructors unless you catch them in the same place.
Throw exceptions at the right level of abstraction.
Include all information that led to the exception in the exception message.
Avoid empty catch blocks.
Know the exceptions your library code throws.
Consider building a centralized exception reporter.
1 | Sub ReportException( _ |
1 | Try ... |
Standardize your project’s use of exceptions.
- If you’re working in a language like C++ that allows you to throw a variety of kinds of objects, data, and pointers, standardize on what specifically you will throw.
For compatibility with other languages, consider throwing only objects derived from the Exception base class.
- Define the specific circumstances under which code is allowed to use throw- catch syntax to perform error processing locally.
- Define the specific circumstances under which code is allowed to throw an exception that won’t be handled locally.
- Determine whether a centralized exception reporter will be used.
- Define whether exceptions are allowed in constructors and destructors.
Consider alternatives to exceptions.
You should always consider the full set of error-handling alternatives: handling the error locally, propagating the error using an error code, logging debug information to a file, shutting down the system, or using some other approach.
Barricade Your Program to Contain the Damage Caused by Errors
One way to barricade for defensive programming purposes is to designate certain interfaces as boundaries to “safe” areas.
This same approach can be used at the class level. The class’s public methods assume the data is unsafe, and they are responsible for checking the data and sanitizing it.
Once the data has been accepted by the class’s public methods, the class’s private methods can assume the data is safe.
Another way of thinking about this approach is as an operating-room technique.
Data is sterilized before it’s allowed to enter the operating room. Anything that’s in the operating room is assumed to be safe.
The key design decision is deciding what to put in the operating room, what to keep out, and where to put the doors—which routines are considered to be inside the safety zone, which are outside, and which sanitize the data.
The easiest way to do this is usually by sanitizing external data as it arrives, but data often needs to be sanitized at more than one level, so multiple levels of sterilization are sometimes required.
Convert input data to the proper type at input time.
Debugging Aids
Don’t Automatically Apply Production Constraints to the Development Version.
A common programmer blind spot is the assumption that limitations of the pro- duction software apply to the development version.
The production version has to run fast. The development version might be able to run slow.
The production version has to be stingy with resources. The development version might be al- lowed to use resources extravagantly.
The production version shouldn’t expose dangerous operations to the user. The development version can have extra opera- tions that you can use without a safety net.
Use Offensive Programming
Exceptional cases should be handled in a way that makes them obvious during development and recoverable when production code is running.
Here are some ways you can program offensively:
- Make sure asserts abort the program. Don’t allow programmers to get into the habit of just hitting the ENTER key to bypass a known problem. Make the problem painful enough that it will be fixed.
- Completely fill any memory allocated so that you can detect memory alloca- tion errors.
- Completely fill any files or streams allocated to flush out any file-format errors.
- Be sure the code in each case statement’s else clause fails hard (aborts the program) or is otherwise impossible to overlook.
- Fill an object with junk data just before it’s deleted
Plan to Remove Debugging Aids
If you’re writing code for commercial use, the performance penalty in size and speed can be prohibitive.
Use version control and build tools like make.
Use a built-in preprocessor.
1 | #define DEBUG |
Other debug code might be for specific purposes only, so you can surround it by a statement like #if DEBUG == POINTER_ERROR.
In other places, you might want to set debug levels, so you could have statements like #if DEBUG > LEVEL_A.
Write your own preprocessor.
Use debugging stubs.
C++ Example of a Routine for Checking Pointers During Development
1 | void CheckPointer( void *pointer ) { |
C++ Example of a Routine for Checking Pointers During Production
1 | void CheckPointer( void *pointer ) { |
Determining How Much Defensive Pro- gramming to Leave in Production Code
Remove code that results in hard crashes.
Leave in code that helps the program crash gracefully.
Log errors for your technical support personnel.
See that the error messages you leave in are friendly.
CHECKLIST: Defensive Programming
General
- Does the routine protect itself from bad input data?
- Have you used assertions to document assumptions, including preconditions and postconditions?
- Have assertions been used only to document conditions that should never occur?
- Does the architecture or high-level design specify a specific set of error han- dling techniques?
- Does the architecture or high-level design specify whether error handling should favor robustness or correctness?
- Have barricades been created to contain the damaging effect of errors and reduce the amount of code that has to be concerned about error processing?
- Have debugging aids been used in the code?
- Has information hiding been used to contain the effects of changes so that they won’t affect code outside the routine or class that’s changed?
- Have debugging aids been installed in such a way that they can be activated or deactivated without a great deal of fuss?
- Is the amount of defensive programming code appropriate—neither too much nor too little?
- Have you used offensive programming techniques to make errors difficult to overlook during development?
Exceptions
- Has your project defined a standardized approach to exception handling?
- Have you considered alternatives to using an exception?
- Is the error handled locally rather than throwing a non-local exception if possible?
- Does the code avoid throwing exceptions in constructors and destructors?
- Are all exceptions at the appropriate levels of abstraction for the routines
that throw them? - Does each exception include all relevant exception background information?
- Is the code free of empty catch blocks? (Or if an empty catch block truly is appropriate, is it documented?)
Security Issues
- Does the code that checks for bad input data check for attempted buffer overflows, SQL injection, html injection, integer overflows, and other mali- cious inputs?
- Are all error-return codes checked?
- Are all exceptions caught?
- Do error messages avoid providing information that would help an attacker break into the system?
The Pseudocode Programming Process
Summary of Steps in Building Classes and Routines.

Steps in Creating a Class
Create a general design for the class.
Construct each routine within the class.
Review and test the class as a whole.

Pseudocode for Pros
Here are guidelines for using pseudocode effectively:
Use English-like statements that precisely describe specific operations.
Avoid syntactic elements from the target programming language. Pseudocode allows you to design at a slightly higher level than the code itself. When you use programming-language constructs, you sink to a lower level, eliminating the main benefit of design at a higher level, and you saddle yourself with unnecessary syntactic restrictions.
Write pseudocode at the level of intent. Describe the meaning of the approach rather than how the approach will be implemented in the target language.
Write pseudocode at a low enough level that generating code from it will be nearly automatic. If the pseudocode is at too high a level, it can gloss over problematic details in the code. Refine the pseudocode in more and more detail until it seems as if it would be easier to simply write the code.
Constructing Routines Using the PPP
- Design the routine
Check the prerequisites.
Define the problem the routine will solve.
The information the routine will hide
Inputs to the routine
Outputs from the routine
Preconditions that are guaranteed to be true before the routine is called (input values within certain ranges, streams initialized, files opened or closed, buffers filled or flushed, etc.)
Post conditions that the routine guarantees will be true before it passes control back to the caller (output values within specified ranges, streams initialized, files opened or closed, buffers filled or flushed, etc.)
Name the routine.
Decide how to test the routine.
Think about error handling.
Think about efficiency.
Research functionality available in the standard libraries.
Research the algorithms and data types.
Write the pseudocode.
Think about the data.
Check the pseudocode.
Try a few ideas in pseudocode, and keep the best (iterate).
- Code the routine

Write the routine declaration.
Turn the pseudocode into high-level comments.
Fill in the code below each comment.
Check whether code should be further factored.
- Check the code
Mentally check the routine for errors.
One of the biggest differences between hobbyists and professional programmers is the difference that grows out of moving from superstition into understanding.
- Clean up leftovers
- Repeat as needed
Alternatives to the PPP
Test-first development.
Design by contract.
CHECKLIST: The Pseudocode Programming Process
- Have you checked that the prerequisites have been satisfied?
- Have you defined the problem that the class will solve?
- Is the high level design clear enough to give the class and each of its routines a good name?
- Have you thought about how to test the class and each of its routines?
- Have you thought about efficiency mainly in terms of stable interfaces and readable implementations, or in terms of meeting resource and speed budgets?
- Have you checked the standard libraries and other code libraries for applicable routines or components?
- Have you checked reference books for helpful algorithms?
- Have you designed each routine using detailed pseudocode?
- Have you mentally checked the pseudocode? Is it easy to understand?
- Have you paid attention to warnings that would send you back to design (use of global data, operations that seem better suited to another class or another routine, and so on)?
- Did you translate the pseudocode to code accurately?
- Did you apply the PPP recursively, breaking routines into smaller routines
when needed? - Did you document assumptions as you made them?
- Did you remove comments that turned out to be redundant?
- Have you chosen the best of several iterations, rather than merely stopping after your first iteration?
- Do you thoroughly understand your code? Is it easy to understand?
General Issues in Using Variables
Turn off implicit declarations.
Declare all variables.
Use naming conventions.
Check variable names.
Guidelines for Initializing Variables
Initialize each variable as it’s declared.
Ideally, declare and define each variable close to where it’s used.
Pay special attention to counters and accumulators.
Initialize a class’s member data in its constructor.
Check the need for reinitialization.
Initialize named constants once; initialize variables with executable code.
Use the compiler setting that automatically initializes all variables.
Take advantage of your compiler’s warning messages.
Check input parameters for validity.
Use a memory-access checker to check for bad pointers.
Initialize working memory at the beginning of your program.
Scope
Keep Variables Live for As Short a Time As Possible.
General Guidelines for Minimizing Scope.
Initialize variables used in a loop immediately before the loop rather than back at the beginning of the routine containing the loop.
Don’t assign a value to a variable until just before the value is used.
Group related statements.
Begin with most restricted visibility, and expand the variable’s scope only if necessary.
Comments on Minimizing Scope
The difference between the “convenience” philosophy and the “intellectual manageability” philosophy boils down to a difference in emphasis between writing programs and reading them.
Persistence
Here are a few steps you can take to avoid this kind of problem:
- Use debug code or assertions in your program to check critical variables for reasonable values. If the values aren’t reasonable, display a warning that tells you to look for improper initialization.
- Write code that assumes data isn’t persistent. For example, if a variable has a certain value when you exit a routine, don’t assume it has the same value the next time you enter the routine. This doesn’t apply if you’re using language- specific features that guarantee the value will remain the same, such as static in C++ and Java.
- Develop the habit of declaring and initializing all data right before it’s used. If you see data that’s used without a nearby initialization, be suspicious!
Binding Time
To summarize, here are the times a variable can be bound to a value in this example (the details could vary somewhat in other cases):
- Coding time (use of magic numbers)
- Compile time (use of a named constant)
- Load time (reading a value from an external source such as the Windows Registry)
- Object instantiation time (such as reading the value each time a window is created)
- Just in time (such as reading the value each time the window is drawn)
Relationship Between Data Types and Control Structures
Sequential data translates to sequential statements in a program.
Selective data translates to if and case statements in a program.
Iterative data translates to for, repeat, and while looping structures in a program.
Using Each Variable for Exactly One Purpose
Avoid variables with hidden meanings.
Make sure that all declared variables are used.
CHECKLIST: General Considerations In Using Data
Initializing Variables
- Does each routine check input parameters for validity?
- Does the code declare variables close to where they’re first used?
- Does the code initialize variables as they’re declared, if possible?
- Does the code initialize variables close to where they’re first used, if it isn’t possible to declare and initialize them at the same time?
- Are counters and accumulators initialized properly and, if necessary, reinitialized each time they are used?
- Are variables reinitialized properly in code that’s executed repeatedly? Does the code compile with no warnings from the compiler?
- If your language uses implicit declarations, have you compensated for the problems they cause?
Other General Issues in Using Data
- Do all variables have the smallest scope possible?
- Are references to variables as close together as possible—both from each
reference to a variable to the next and in total live time? -Do control structures correspond to the data types? - Are all the declared variables being used?
- Are all variables bound at appropriate times, that is, striking a conscious balance between the flexibility of late binding and the increased complexity associated with late binding?
- Does each variable have one and only one purpose?
- Is each variable’s meaning explicit, with no hidden meanings?
The Power of Variable Names
Considerations in Choosing Good Names

A good mnemonic name generally speaks to the problem rather than the solution. A good name tends to express the what more than the how.

When you give a variable a short name like i, the length itself says something about the variable—namely, that the variable is a scratch value with a limited scope of operation.
Use qualifiers on names that are in the global name space.
Many programs have variables that contain computed values: totals, averages, maximums, and so on.
If you modify a name with a qualifier like Total, Sum, Average, Max, Min, Record, String, or Pointer, put the modifier at the end of the name.
Naming Status Variables
Think of a better name than flag for status variables.
Naming Boolean Variables
Keep typical boolean names in mind.

Use positive boolean variable names.
Naming Enumerated Types
When you use an enumerated type, you can ensure that it’s clear that members of the type all belong to the same group by using a group prefix, such as Color_, Planet_, or Month_.
In addition, the enum type itself (Color, Planet, or Month) can be identified in various ways, including all caps or prefixes (e_Color, e_Planet, or e_Month).
Guidelines for a Language-Independent Convention
Differentiate between variable names and routine names.
Differentiate between classes and objects.

Each of these options has strengths and weaknesses.
Identify global variables.
Identify member variables.
Identify type definitions.
Identify named constants.
Identify elements of enumerated types.
Identify input-only parameters in languages that don’t enforce them.
Format names to enhance readability.


Variable names include three kinds of information:
- The contents of the variable (what it represents)
- The kind of data (named constant, primitive variable, user-defined type, or class)
- The scope of the variable (private, class, package, or global)





User-Defined–Type (UDT) Abbreviation

Creating Short Names That Are Readable
General Abbreviation Guidelines.
- Use standard abbreviations (the ones in common use, which are listed in a dictionary).
- Remove all nonleading vowels. (computer becomes cmptr, and screen becomes scrn. apple becomes appl, and integer becomes intgr.)
- Remove articles: and, or, the, and so on.
- Use the first letter or first few letters of each word.
- Truncate after the first, second, or third (whichever is appropriate) letter of each word.
- Keep the first and last letters of each word.
- Use every significant word in the name, up to a maximum of three words. Remove useless suffixes—ing, ed, and so on.
- Keep the most noticeable sound in each syllable.
- Iterate through these techniques until you abbreviate each variable name to between 8 to 20 characters, or the number of characters to which your language limits variable names.
Comments on Abbreviations
Don’t abbreviate by removing one character from a word.
Abbreviate consistently.
Create names that you can pronounce.
Avoid combinations that result in mispronunciation.
Use a thesaurus to resolve naming collisions.
Document extremely short names with translation tables in the code.

Document all abbreviations in a project-level “Standard Abbreviations” document.
Abbreviations in code create two general risks:
- A reader of the code might not understand the abbreviation
- Other programmers might use multiple abbreviations to refer to the same word, which creates needless confusion
Remember that names matter more to the reader of the code than to the writer.
Avoid numerals in names.
Don’t differentiate variable names solely by capitalization.
Avoid multiple natural languages.
Avoid the names of standard types, variables, and routines.
Don’t use names that are totally unrelated to what the variables represent.
CHECKLIST: Naming Variables
General Naming Considerations
- Does the name fully and accurately describe what the variable represents?
- Does the name refer to the real-world problem rather than to the programming-language solution?
- Is the name long enough that you don’t have to puzzle it out?
- Are computed-value qualifiers, if any, at the end of the name?
- Does the name use Count or Index instead of Num?
Naming Specific Kinds Of Data
- Are loop index names meaningful (something other than i, j, or k if the loop is more than one or two lines long or is nested)?
- Have all “temporary” variables been renamed to something more meaningful?
- Are boolean variables named so that their meanings when they’re True are clear?
- Do enumerated-type names include a prefix or suffix that indicates the category—for example, Color_ for Color_Red, Color_Green, Color_Blue, and so on?
- Are named constants named for the abstract entities they represent rather than the numbers they refer to?
Naming Conventions
- Does the convention distinguish among local, class, and global data? -Does the convention distinguish among type names, named constants, enumerated types, and variables?
- Does the convention identify input-only parameters to routines in languages that don’t enforce them?
- Is the convention as compatible as possible with standard conventions for the language?
- Are names formatted for readability?
Short Names
- Does the code use long names (unless it’s necessary to use short ones)?
- Does the code avoid abbreviations that save only one character? -Are all words abbreviated consistently?
- Are the names pronounceable?
- Are names that could be mispronounced avoided?
- Are short names documented in translation tables?
Common Naming Problems: Have You Avoided…
…names that are misleading?
…names with similar meanings?
…names that are different by only one or two characters?
…names that sound similar?
…names that use numerals?
…names intentionally misspelled to make them shorter?
…names that are commonly misspelled in English?
…names that conflict with standard library-routine names or with predefined variable names?
…totally arbitrary names?
…hard-to-read characters?
Fundamental Data Types
Numbers in General
Avoid “magic numbers.”
Magic numbers are literal numbers such as 100 or 47524 that appear in the middle of a program without explanation.
Anticipate divide-by-zero errors.
Make type conversions obvious.
Avoid mixed-type comparisons.
Heed your compiler’s warnings.
Integers
Check for integer division.
Check for integer overflow.

Check for overflow in intermediate results.
Floating-Point Numbers
Avoid additions and subtractions on numbers that have greatly different magnitudes.
Avoid equality comparisons.
Anticipate rounding errors.
Common specific solutions to rounding problems:
First, change to a variable type that has greater precision.
Second, change to binary coded decimal (BCD) variables.
Third, change from floating-point to integer variables.
Characters and Strings
Avoid magic characters and strings. -> 就是唐突一个 (theNumber < 20) then的20.
Watch for off-by-one errors.
Know how your language and environment support Unicode.
Conversion between Unicode and other character sets is often required for communication with standard and third-party libraries.
If some strings won’t be in Unicode (for example, in C or C++), decide early on whether to use the Unicode character set at all.
Decide on an internationalization/localization strategy early in the lifetime of a program.
If you know you only need to support a single alphabetic language, consider using an ISO 8859 character set.
If you need to support multiple languages, use Unicode.
Decide on a consistent conversion strategy among string types.
Boolean Variables
Use boolean variables to document your program.
Use boolean variables to simplify complicated tests.
Enumerated Types
Use enumerated types for readability.
Instead of writing statements like
if chosenColor = 1
you can write more readable expressions like
if chosenColor = Color_Red
Use enumerated types for reliability.
Use enumerated types for modifiability.
Use enumerated types as an alternative to boolean variables.
Check for invalid values.
Define the first and last entries of an enumeration for use as loop limits.
Reserve the first entry in the enumerated type as invalid.
Define precisely how First and Last elements are to be used in the project coding standard, and use them consistently.
Beware of pitfalls of assigning explicit values to elements of an enumeration.
Named Constants
A named constant is like a variable except that you can’t change the constant’s value once you’ve assigned it.
Named constants enable you to refer to fixed quantities such as the maximum number of employees by a name rather than a number—MaximumEmployees rather than 1000, for instance.
Using a named constant is a way of “parameterizing” your program.
Use named constants in data declarations.
LOCAL_NUMBER_LENGTH.
Avoid literals, even “safe” ones.
Simulate named constants with appropriately scoped variables or classes.
Use named constants consistently.
Arrays
Make sure that all array indexes are within the bounds of the array.
Think of arrays as sequential structure.
Check the end points of arrays.
If an array is multidimensional, make sure its subscripts are used in the correct order.
Watch out for index cross talk.
Throw in an extra element at the end of an array.
Creating Your Own Types
A more powerful example would combine the idea of creating your own types with the idea of information hiding. In some cases, the information you want to hide is information about the type of the data.
These examples have illustrated several reasons to create your own types:
To make modifications easier.
It’s little work to create a new type, and it gives you a lot of flexibility.To avoid excessive information distribution.
Hard typing spreads data-typing details around your program instead of centralizing them in one place.To increase reliability.
In Ada you can define types such as type Age_t is range 0..99.
The compiler then generates run-time checks to verify that any variable of type Age_t is always within the range 0..99.To make up for language weaknesses.
If your language doesn’t have the predefined type you want, you can create it yourself.
For example, C doesn’t have a boolean or logical type. This deficiency is easy to compensate for by creating the type yourself:typedef int Boolean_t;
Guidelines for Creating Your Own Types.
Create types with functionally oriented names.
Avoid predefined types.
Don’t redefine a predefined type.
Define substitute types for portability.
For example, you can define a type INT and use it instead of int, or a type LONG instead of long.
Originally, the only difference between the two types would be their capitalization.
But when you moved the program to a new hardware platform, you could redefine the capitalized versions so that they could match the data types on the original hardware.
Consider creating a class rather than using a typedef.
CHECKLIST: Fundamental Data
Numbers in General
- Does the code avoid magic numbers?
- Does the code anticipate divide-by-zero errors?
- Are type conversions obvious?
- If variables with two different types are used in the same expression, will the expression be evaluated as you intend it to be?
- Does the code avoid mixed-type comparisons?
- Does the program compile with no warnings?
Integers
- Do expressions that use integer division work the way they’re meant to?
- Do integer expressions avoid integer-overflow problems?
Floating-Point Numbers
- Does the code avoid additions and subtractions on numbers with greatly different magnitudes?
- Does the code systematically prevent rounding errors?
- Does the code avoid comparing floating-point numbers for equality?
Characters and Strings
- Does the code avoid magic characters and strings?
- Are references to strings free of off-by-one errors?
- Does C code treat string pointers and character arrays differently?
- Does C code follow the convention of declaring strings to be length constant+1?
- Does C code use arrays of characters rather than pointers, when appropriate?
- Does C code initialize strings to NULLs to avoid endless strings?
- Does C code use strncpy() rather than strcpy()? And strncat() and strncmp()?
Boolean Variables
- Does the program use additional boolean variables to document conditional tests?
- Does the program use additional boolean variables to simplify conditional tests?
Enumerated Types
- Does the program use enumerated types instead of named constants for their improved readability, reliability, and modifiability?
- Does the program use enumerated types instead of boolean variables when a variable’s use cannot be completely captured with TRUE and FALSE?
- Do tests using enumerated types test for invalid values?
- Is the first entry in an enumerated type reserved for “invalid”?
Named Constants
- Does the program use named constants for data declarations and loop limits rather than magic numbers?
- Have named constants been used consistently—not named constants in some places, literals in others?
Arrays
- Are all array indexes within the bounds of the array?
- Are array references free of off-by-one errors?
- Are all subscripts on multidimensional arrays in the correct order?
- In nested loops, is the correct variable used as the array subscript, avoiding loop-index cross talk?
Creating Types
- Does the program use a different type for each kind of data that might change?
- Are type names oriented toward the real-world entities the types represent rather than toward programming-language types?
- Are the type names descriptive enough to help document data declarations?
- Have you avoided redefining predefined types?
- Have you considered creating a new class rather than simply redefining a type?
Unusual Data Types
Structures
The term “structure” refers to data that’s built up from other types.
Some reasons for using structures:
Use structures to clarify data relationships.

Use structures to simplify operations on blocks of data.
previousOldEmployee = oldEmployee
oldEmployee = newEmployee
Use structures to simplify parameter lists.

Use structures to reduce maintenance.
If your Employee structure has a title field and you decide to delete it, you don’t need to change any of the parameter lists or assignment statements that use the whole structure.
Pointers
Conceptually, every pointer consists of two parts: a location in memory and a knowledge of how to interpret the contents of that location.
Location in Memory
The location in memory is an address, often expressed in hexadecimal notation.
An address on a 32-bit processor would be a 32-bit value such as 0x0001EA40.
The pointer itself contains only this address.
To use the data the pointer points to, you have to go to that address and interpret the contents of memory at that location.
Knowledge of How to Interpret the Contents
The knowledge of how to interpret the contents of a location in memory is provided by the base type of the pointer.
If a pointer points to an integer, what that really means is that the compiler interprets the memory location given by the pointer as an integer.
General Tips on Pointers
A pointer error is usually the result of a pointer’s pointing somewhere it shouldn’t.
When you assign a value to a bad pointer variable, you write data into an area of memory you shouldn’t. This is called memory corruption.
In short, symptoms of pointer errors tend to be unrelated to causes of pointer errors. Thus, most of the work in correcting a pointer error is locating the cause.
Symptoms of pointer errors are so erratic that extra measures to make the symptoms more predictable are justified. Here’s how to achieve these key goals:
Isolate pointer operations in routines or classes.
Declare and define pointers at the same time.
Check pointers before using them.
Check the variable referenced by the pointer before using it.
Use dog-tag fields to check for corrupted memory.
A “tag field” or “dog tag” is a field you add to a structure solely for the purpose of error checking.
Putting a dog tag at the beginning of the memory block you’ve allocated allows you to check for redundant attempts to deallocate the memory block without needing to maintain a list of all the memory blocks you’ve allocated.
Putting the dog tag at the end of the memory block allows you to check for overwriting memory beyond the location that was supposed to be used.
Add explicit redundancies.
An alternative to using a tag field is to use certain fields twice.
If the data in the redundant fields doesn’t match, you know memory has been corrupted.
Use extra pointer variables for clarity.
Don’t skimp on pointer variables.
The point is made elsewhere that a variable shouldn’t be used for more than one purpose.
Simplify complicated pointer expressions.

Free pointers in linked lists in the right order.
Allocate a reserve parachute of memory.
Free pointers at the same scoping level as they were allocated.
Shred your garbage.
Set pointers to NULL after deleting or freeing them.
Check for bad pointers before deleting a variable.
Setting freed pointers to NULL also allows you to check whether a pointer is set to NULL before you use it or attempt to delete it again; if you don’t set freed pointers to NULL, you won’t have that option.
Keep track of pointer allocations.
Write cover routines to centralize your strategy to avoiding pointer problems.
Use a nonpointer technique.
Here are some guidelines that apply to using pointers in C++.
Understand the difference between pointers and references.
In C++, both pointers (*) and the references (&) refer indirectly to an object, and to the uninitiated the only difference appears to be a purely cosmetic distinction between referring to fields as object->field vs. object.field.
The most significant differences are that a reference must always refer to an object, whereas a pointer can point to NULL; and what a reference refers to can’t be changed after the reference is initialized.
Use pointers for “pass by reference” parameters and const references for “pass by value” parameters.
When you pass an object to a routine by value, C++ creates a copy of the object, and when the object is passed back to the calling routine, a copy is created again. -> 原来有两份??
Sometimes, however, you would like to have the semantics of pass by reference—that is, that the passed object should not be altered—with the implementation of pass by value—that is, passing the actual object rather than a copy.
In C++, the resolution to this issue is that you use pointers for pass by reference, and—odd as the terminology might sound—const references for pass by value!
我虽然pass了reference但是不让你动.
1 | void SomeRoutine( |
In a modifiable object, the references to members will use the object->member notation, whereas for nonmodifiable objects references to members will use object.member notation.
If you control your own code base, it’s good discipline to use const whenever possible.
Use auto_ptrs.
auto_ptrs avoid many of the memory-leakage problems associated with regular pointers by deleting memory automatically when the auto_ptr goes out of scope.
C-Pointer Pointers
Here are a few tips on using pointers that apply specifically to the C language.
Use explicit pointer types rather than the default type.
C lets you use char or void pointers for any type of variable. As long as the pointer points, the language doesn’t really care what it points at.
If you use explicit types for your pointers, however, the compiler can give you warnings about mismatched pointer types and inappropriate dereferences.
Avoid type casting.
Type casting turns off your complier’s ability to check for type mismatches and therefore creates a hole in your defensive-programming armor.
Follow the asterisk rule for parameter passing.
It’s easy to remember that, as long as you have an asterisk in front of the parameter when you assign it a value, the value is passed back to the calling routine.
Regardless of how many asterisks you stack up in the declaration, you must have at least one in the assignment statement if you want to pass back a value.

Use sizeof() to determine the size of a variable in a memory allocation.
Global Data
Common Problems with Global Data
Inadvertent changes to global data.
Bizarre and exciting aliasing problems with global data.

Re-entrant code problems with global data.
Code reuse hindered by global data.
Uncertain initialization-order issues with global data.
If the class you want to reuse reads or writes global data, you can’t just plug it into the new program.
Modularity and intellectual manageability damaged by global data.
Reasons to Use Global Data
Preservation of global values.
Emulation of named constants.
Emulation of enumerated types.
You can also use global variables to emulate enumerated types in languages such as Python that don’t support enumerated types directly.
Streamlining use of extremely common data.
Eliminating tramp data.
Use Global Data Only as a Last Resort
Begin by making each variable local and make variables global only as you need to.
Distinguish between global and class variables.
Use access routines.
Using Access Routines Instead of Global Data
The use of access routines is a core technique for implementing abstract data types and achieving information hiding.
Advantages of Access Routines
You get centralized control over the data.
If you discover a more appropriate implementation of the structure later, you don’t have to change the code everywhere the data is referenced.
You can ensure that all references to the variable are barricaded.
If you allow yourself to push elements onto the stack with statements like stack.array[ stack.top ] = newElement, you can easily forget to check for stack overflow and make a serious mistake.
If you use access routines, for example, PushStack( newElement )—you can write the check for stack overflow into the PushStack() routine.
You get the general benefits of information hiding automatically.
Access routines are easy to convert to an abstract data type.
For example, instead of writing code that says if lineCount > MAX_LINES, an access routine allows you to write code that says if PageFull().
How to Use Access Routines
Here’s the short version of the theory and practice of access routines: Hide data in a class.
Declare that data using the static keyword or its equivalent to ensure there is only a single instance of the data.
For example, if you have a global status variable g_globalStatus that describes your program’s overall status, you can create two access routines: globalStatus.Get() and globalStatus.set(), each of which does what it sounds like it does.
Require all code to go through the access routines for the data.
A good convention is to require all global data to begin with the g_ prefix, and to further require that no code access a variable with the g_ prefix except that variable’s access routines.
Don’t just throw all your global data into the same barrel.
Use locking to control access to global variables.
Build a level of abstraction into your access routines.

Keep all accesses to the data at the same level of abstraction.
How to Reduce the Risks of Using Global Data
Develop a naming convention that makes global variables obvious.
Create a well-annotated list of all your global variables.
Don’t use global variables to contain intermediate results.
Don’t pretend you’re not using global data by putting all your data into a monster object and passing it everywhere.
CHECKLIST: Considerations In Using Unusual Data Types
Structures
- Have you used structures instead of naked variables to organize and manipulate groups of related data?
- Have you considered creating a class as an alternative to using a structure?
Global Data
- Are all variables local or class-scope unless they absolutely need to be global?
- Do variable naming conventions differentiate among local, class, and global data?
- Are all global variables documented?
- Is the code free of pseudoglobal data—mammoth objects containing a
mishmash of data that’s passed to every routine? - Are access routines used instead of global data?
- Are access routines and data organized into classes?
- Do access routines provide a level of abstraction beyond the underlying data-type implementations?
- Are all related access routines at the same level of abstraction?
Pointers
- Are pointer operations isolated in routines?
- Are pointer references valid, or could the pointer be dangling?
- Does the code check pointers for validity before using them?
- Is the variable that the pointer references checked for validity before it’s used?
- Are pointers set to NULL after they’re freed?
- Does the code use all the pointer variables needed for the sake of readability?
- Are pointers in linked lists freed in the right order?
- Does the program allocate a reserve parachute of memory so that it can shut down gracefully if it runs out of memory?
- Are pointers used only as a last resort, when no other method is available?
Organizing Straight-Line Code
Statements That Must Be in a Specific Order
When statements have dependencies that require you to put them in a certain order, take steps to make the dependencies clear.
Some simple guidelines for ordering statements:
Organize code so that dependencies are obvious.
Name routines so that dependencies are obvious.
Use routine parameters to make dependencies obvious.

Document unclear dependencies with comments.
Check for dependencies with assertions or error-handling code.
For example, in the class’s constructor, you might initialize a class member variable isExpenseDataInitialized to FALSE. Then in InitializeExpenseData(), you can set isExpenseDataInitialized to TRUE.
Statements Whose Order Doesn’t Matter
The guiding principle is the Principle of Proximity: Keep related actions together.
Making Code Read from Top to Bottom.
Grouping Related Statements.
References to each object are kept close together; they’re “localized.” The number of lines of code in which the objects are “live” is small.
Checklist: Organizing Straight-Line Code
- Does the code make dependencies among statements obvious?
- Do the names of routines make dependencies obvious?
- Do parameters to routines make dependencies obvious?
- Do comments describe any dependencies that would otherwise be unclear?
- Have housekeeping variables been used to check for sequential dependencies in critical sections of code?
- Does the code read from top to bottom?
- Are related statements grouped together?
- Have relatively independent groups of statements been moved into their own routines?
Using Conditionals
Plain if-then Statements
Follow these guidelines when writing if statements:
Write the nominal path through the code first; then write the unusual cases.
Make sure that you branch correctly on equality.
Using > instead of >= or < instead of <= is analogous to making an off-by-one error in accessing an array or computing a loop index.
In a loop, think through the endpoints to avoid an off-by-one error.
In a conditional statement, think through the equals case to avoid an off-by-one error.
Put the normal case after the if rather than after the else.

Follow the if clause with a meaningful statement.
Consider the else clause.
Test the else clause for correctness.
Check for reversal of the if and else clauses.
Chains of if-then-else Statements
Simplify complicated tests with boolean function calls.
Put the most common cases first.
Make sure that all cases are covered.
Case Statements
Choosing the Most Effective Ordering of Cases.
Order cases alphabetically or numerically.
Put the normal case first.
Order cases by frequency.
Keep the actions of each case simple.
Don’t make up phony variables in order to be able to use the case statement.


Use the default clause only to detect legitimate defaults.
Use the default clause to detect errors.
In C++ and Java, avoid dropping through the end of a case statement.
指的是switch case不写break混在一起.
In C++, clearly and unmistakably identify flow-throughs at the end of a case statement.
用comment写写干净.
e.g: // FALLTHROUGH – Full documentation also prints summary comments
CHECKLIST: Conditionals
if-then Statements
- Is the nominal path through the code clear?
- Do if-then tests branch correctly on equality?
- Is the else clause present and documented?
- Is the else clause correct?
- Are the if and else clauses used correctly—not reversed? - Does the normal case follow the if rather than the else?
if-then-else-if Chains
- Are complicated tests encapsulated in boolean function calls?
- Are the most common cases tested first?
- Are all cases covered?
- Is the if-then-else-if chain the best implementation—better than a case statement?
case Statements
- Are cases ordered meaningfully?
- Are the actions for each case simple—calling other routines if necessary? - Does the case statement test a real variable, not a phony one that’s made up
solely to use and abuse the case statement? - Is the use of the default clause legitimate?
- Is the default clause used to detect and report unexpected cases? - In C, C++, or Java, does the end of each case have a break?
Controlling Loops
Selecting the Kind of Loop
The counted loop is performed a specific number of times, perhaps one time for each employee.
The continuously evaluated loop doesn’t know ahead of time how many times it will be executed and tests whether it has finished on each iteration.
For example, it runs while money remains, until the user selects quit, or until it encounters an error.
The endless loop executes forever once it has started.
It’s the kind you find in embedded systems such as pacemakers, microwave ovens, and cruise controls.
The iterator loop that performs its action once for each element in a container class.

When to Use a while Loop
Loop with Test at the Beginning.
Loop with Test at the End.
When to Use a loop-with-exit Loop
Normal loop-with-exit Loops.
The loop-with-exit loop is a one-entry, one-exit, structured control construct, and it is the preferred kind of loop control.
Consider when you use this kind of loop:
Put all the exit conditions in one place.
Spreading them around practically guarantees that one exit condition or another will be overlooked during debugging, modification, or testing.
Use comments for clarification.
If you use the loop-with-exit loop technique in a language that doesn’t support it directly use comments to make what you’re doing clear.
Abnormal loop-with-exit Loops.

When to Use a for Loop
A for loop is a good choice when you need a loop that executes a specified number of times.
Use them when the loop control involves simple increments or simple decrements.
Likewise, don’t explicitly change the index value of a for loop to force it to terminate. Use a while loop instead.
When to Use a foreach Loop
Is useful for performing an operation on each member of an array or other container.
Controlling the Loop
You can forestall these problems by observing two practices.
First, minimize the number of factors that affect the loop.
Simplify! Simplify! Simplify!
Second, treat the inside of the loop as if it were a routine—keep as much of the control as possible outside the loop.
Entering the Loop
Enter the loop from one location only.
Put initialization code directly before the loop.
In C++, use the FOREVER macro for infinite loops and event loops.
1 | #define FOREVER for (;;) ... |
In C++ and Java, use for( ;; ) or while( true ) for infinite loops.
In C++, prefer for loops when they’re appropriate.
Don’t use a for loop when a while loop is more appropriate.
The advantage of C++’s for loop over for loops in other languages is that it’s more flexible about the kinds of initialization and termination information it can use.
The weakness inherent in such flexibility is that you can put statements into the loop header that have nothing to do with controlling the loop.
1 | // read all the records from a file inputFile.MoveToStart(); recordCount = 0; |
Processing the Middle of the Loop
Use { and } to enclose the statements in a loop.
Avoid empty loops.
Keep loop-housekeeping chores at either the beginning or the end of the loop.
“Loop housekeeping” chores are expressions like i = i + 1, expressions whose main purpose isn’t to do the work of the loop but to control the loop.
Make each loop perform only one function.
Exiting the Loop
Assure yourself that the loop ends.
Make loop-termination conditions obvious.
Don’t monkey with the loop index of a for loop to make the loop terminate.

When you set up a for loop, the loop counter is off limits.
Use a while loop to provide more control over the loop’s exit conditions.
Avoid code that depends on the loop index’s final value.


Consider using safety counters.

Safety counters are not a cure all.
Introduced into the code one at a time, safety counters might lead to additional errors.
Exiting Loops Early
In this discussion, break is a generic term for break in C++, C, and Java, Exit-Do and Exit-For in Visual Basic, and similar constructs, including those simulated with gotos in languages that don’t support break directly.
The break statement (or equivalent) causes a loop to terminate through the normal exit channel; the program resumes execution at the first statement following the loop.
The continue statement is similar to break in that it’s an auxiliary loop-control statement.
Rather than causing a loop exit, however, continue causes the program to skip the loop body and continue executing at the beginning of the next iteration of the loop.
Consider using break statements rather than boolean flags in a while loop.
Be wary of a loop with a lot of breaks scattered through it.
Use continue for tests at the top of a loop.

Using continue in this way lets you avoid an if test that would effectively indent the entire body of the loop.
If, on the other hand, the continue occurs toward the middle or end of the loop, use an if instead.
Use labeled break if your language supports it.
Use break and continue only with caution.
Use break only after you have considered the alternatives.
It really is a simple proposition: If you can’t defend a break or a continue, don’t use it.
Checking Endpoints
A single loop usually has three cases of interest: the first case, an arbitrarily selected middle case, and the last case.
When you create a loop, mentally run through the first, middle, and last cases to make sure that the loop doesn’t have any off-by-one errors.
If you have any special cases that are different from the first or last case, check those too. If the loop contains complex computations, get out your calculator and manually check the calculations.
Willingness to perform this kind of check is a key difference between efficient and inefficient programmers.
Efficient programmers do the work of mental simulations and hand calculations because they know that such measures help them find errors.
Using Loop Variables
Use ordinal or enumerated types for limits on both arrays and loops.
Use meaningful variable names to make nested loops readable.
Use meaningful names to avoid loop-index cross talk.
The use of i is so habitual that it’s used twice in the same nesting structure.
The second for loop controlled by i conflicts with the first, and that’s index cross talk.
Limit the scope of loop-index variables to the loop itself.

This technique is helpful for documenting the purpose of the recordCount variable, however don’t rely on your compiler to enforce recordCount’s scope.
As is often the case with more esoteric language features, compiler
implementations can vary.
How Long Should a Loop Be?
Make your loops short enough to view all at once.
Experts have suggested a loop-length limit of one printed page, or 66 lines.
Limit nesting to three levels.
Move loop innards of long loops into routines.
Make long loops especially clear.
Creating Loops Easily—from the Inside Out
Here’s the general process.
Start with one case.
Code that case with literals.
Then indent it, put a loop around it, and replace the literals with loop indexes or computed expressions.
Put another loop around that, if necessary, and replace more literals.
Continue the process as long as you have to.
When you finish, add all the necessary initializations.
书里这里加入了一个例子.
跟画图那种先从概括到细节的感觉一样.
CHECKLIST: Loops
Loop Selection and Creation
- Is a while loop used instead of a for loop, if appropriate?
- Was the loop created from the inside out?
Entering the Loop
- Is the loop entered from the top?
- Is initialization code directly before the loop?
- If the loop is an infinite loop or an event loop, is it constructed cleanly rather than using a kludge such as for i = 1 to 9999?
- If the loop is a C++, C, or Java for loop, is the loop header reserved for loop-control code?
Inside the Loop
- Does the loop use { and } or their equivalent to prevent problems arising from improper modifications?
- Does the loop body have something in it? Is it nonempty?
- Are housekeeping chores grouped, at either the beginning or the end of the loop?
- Does the loop perform one and only one function—as a well-defined routine does?
- Is the loop short enough to view all at once?
- Is the loop nested to three levels or less?
- Have long loop contents been moved into their own routine?
- If the loop is long, is it especially clear?
Loop Indexes
- If the loop is a for loop, does the code inside it avoid monkeying with the loop index?
- Is a variable used to save important loop-index values rather than using the loop index outside the loop?
- Is the loop index an ordinal type or an enumerated type—not floating point?
- Does the loop index have a meaningful name?
- Does the loop avoid index cross talk?
Exiting the Loop
- Does the loop end under all possible conditions?
- Does the loop use safety counters—if you’ve instituted a safety-counter standard?
- Is the loop’s termination condition obvious?
- If break or continue are used, are they correct?
Unusual Control Structures
Multiple Returns from a Routine
Use a return when it enhances readability.
Use guard clauses (early returns or exits) to simplify complex error processing.


With production-size code, the Exit Sub approach creates a noticeable amount of code before the nominal case is handled.
Minimize the number of returns in each routine.
Recursion
In recursion, a routine solves a small part of a problem itself, divides the problem into smaller pieces, and then calls itself to solve each of the smaller pieces.
Suppose you have a data type that represents a maze.
A maze is basically a grid, and at each point on the grid you might be able to turn left, turn right, move up, or move down.
1 | bool FindPathThroughMaze( Maze maze, Point position ) { |
The first line of code checks to see whether the position has already been tried.
One key aim in writing a recursive routine is the prevention of infinite recursion.
The second statement checks to see whether the position is the exit from the maze.
If ThisIsTheExit() returns true, the routine itself returns true.
The third statement remembers that the position has been visited.
This prevents the infinite recursion that would result from a circular path.
The remaining lines in the routine try to find a path to the left, up, down, and to the right.
The code stops the recursion if the routine ever returns true, that is, when the routine finds a path through the maze.
Tips for Using Recursion
Make sure the recursion stops.
That usually means that the routine has a test that stops further recursion when it’s not needed.
Use safety counters to prevent infinite recursion.
If you’re using recursion in a situation that doesn’t allow a simple test such as the one just described, use a safety counter to prevent infinite recursion.
The safety counter has to be a variable that’s not re-created each time you call the routine.
Use a class member variable or pass the safety counter as a parameter.
If you don’t want to pass the safety counter as an explicit parameter, you could use a static variable
Limit recursion to one routine.
Keep an eye on the stack.
Don’t use recursion for factorials or Fibonacci numbers.

You should consider alternatives to recursion before using it.
You can do anything with stacks and iteration that you can do with recursion.
goto
记得摸老板很喜欢用goto.书里写了好处与坏处.
我跳过了,因为可以用其他方式替换.
Use of gotos defeats compiler optimizations.
Some optimizations depend on a program’s flow of control residing within a few statements.
An unconditional goto makes the flow harder to analyze and reduces the ability of the compiler to optimize the code.
Proponents of gotos sometimes argue that they make code faster or smaller.
But code containing gotos is rarely the fastest or smallest possible.
A well-placed goto can eliminate the need for duplicate code.
Duplicate code leads to problems if the two sets of code are modified differently.
Duplicate code increases the size of source and executable files.
The bad effects of the goto are outweighed in such a case by the risks of duplicate code.
The goto is useful in a routine that allocates resources, performs operations on those resources, and then deallocates the resources.
With a goto, you can clean up in one section of code.
The goto reduces the likelihood of your forgetting to deallocate the resources in each place you detect an error.
Good programming doesn’t mean eliminating gotos.
The arguer on the “I can’t live without gotos” side usually presents a case in which eliminating a goto results in an extra comparison or the duplication of a line of code.
This proves mainly that there’s a case in which using a goto results in one less comparison—not a significant gain on today’s computers.
Error Processing and gotos
1 | ' This routine purges a group of files. |
This routine is typical of circumstances in which experienced programmers decide to use a goto.
Similar cases come up when a routine needs to allocate and clean up resources like database connections, memory, or temporary files.
The alternative to gotos in those cases is usually duplicating code to clean up the resources.
Here’s a summary of guidelines for using gotos:
Use gotos to emulate structured control constructs in languages that don’t support them directly.
When you do, emulate them exactly.
Don’t abuse the extra flexibility the goto gives you.
Don’t use the goto when an equivalent built-in construct is available.
Measure the performance of any goto used to improve efficiency.
In most cases, you can recode without gotos for improved readability and no loss in efficiency.
If your case is the exception, document the efficiency improvement so that gotoless evangelists won’t remove the goto when they see it.
Limit yourself to one goto label per routine unless you’re emulating structured constructs.
Limit yourself to gotos that go forward, not backward, unless you’re emulating structured constructs.
Make sure all goto labels are used.
Unused labels might be an indication of missing code, namely the code that goes to the labels.
If the labels aren’t used, delete them.
Make sure a goto doesn’t create unreachable code.
If you’re a manager, adopt the perspective that a battle over a single goto isn’t worth the loss of the war.
If the programmer is aware of the alternatives and is willing to argue, the goto is probably OK.
CHECKLIST: Unusual Control Structures
return
- Does each routine use return only when necessary?
- Do returns enhance readability?
Recursion
- Does the recursive routine include code to stop the recursion?
- Does the routine use a safety counter to guarantee that the routine stops?
- Is recursion limited to one routine?
- Is the routine’s depth of recursion within the limits imposed by the size of the program’s stack?
- Is recursion the best way to implement the routine? Is it better than simple iteration?
goto
- Are gotos used only as a last resort, and then only to make code more readable and maintainable?
- If a goto is used for the sake of efficiency, has the gain in efficiency been measured and documented?
- Are gotos limited to one label per routine?
- Do all gotos go forward, not backward?
- Are all goto labels used?
Table-Driven Methods
When you use table-driven methods, you have to address two issues:
First you have to address the question of how to look up entries in the table.
Here’s a list of ways to look up an entry in a table:
- Direct access
- Indexed access
- Stair-step access
The second issue you have to address if you’re using a table-driven method is what you should store in the table.
In some cases, the result of a table lookup is data.
In other cases, the result of a table lookup is an action. I
n such a case, you can store a code that describes the action or, in some languages, you can store a reference to the routine that implements the action.
Direct Access Tables
Like all lookup tables, direct-access tables replace more complicated logical control structures.
A table entry to describe one kind of message might look like this:
1 | Message Begin |
Here’s how you would set up the object types in C++:
1 | class AbstractField { |
This code fragment declares a member routine for each class that has a string parameter and a FileStatus parameter.
The second step is to declare an array to hold the set of objects. The array is the lookup table, and here’s how it looks:
1 | AbstractField* field[ Field_Last ]; |
The final step required to set up the table of objects is to assign the names of specific objects to the Field array.
Here’s how those assignments would look:
1 | field[ Field_FloatingPoint ] = new FloatingPointField(); |
Once the table of routines is set up, you can handle a field in the message simply by accessing the table of objects and calling one of the member routines in the table.
1 | messageIdx = 1; |
Fudging Lookup Keys
Duplicate information to make the key work directly.
Transform the key to make it work directly.
Isolate the key-transformation in its own routine.
Indexed Access Tables
When you use indexes, you use the primary data to look up a key in an index table and then you use the value from the index table to look up the main data you’re interested in.
Indexed access schemes offer two main advantages.
First, if each of the entries in the main lookup table is large, it takes a lot less space to create an index array with a lot of wasted space than it does to create a main lookup table with a lot of wasted space.
The second advantage, even if you don’t save space by using an index, is that it’s sometimes cheaper to manipulate entries in an index than entries in a main table.
A final advantage of an index-access scheme is the general table-lookup advantage of maintainability.
Stair-Step Access Tables
To use the stair-step method, you put the upper end of each range into a table and then write a loop to check a score against the upper end of each range.
When you find the point at which the score first exceeds the top of a range, you know what the grade is.
With the stair-step technique, you have to be careful to handle the endpoints of the ranges properly.
Watch the endpoints.
Consider using a binary search rather then a sequential search.
Consider using indexed access instead of the stair-step technique.
Put the stair-step table lookup into its own routine.
CHECKLIST: Table-Driven Methods
- Have you considered table-driven methods as an alternative to complicated logic?
- Have you considered table-driven methods as an alternative to complicated inheritance structures?
- Have you considered storing the table’s data externally and reading it at run time so that the data can be modified without changing code?
- If the table cannot be accessed directly via a straightforward array index (as in the Age example), have your put the access-key calculation into a routine rather than duplicating the index calculation in the code?
General Control Issues
Use the identifiers True and False in boolean expressions rather than using flags like 0 and 1.
Here are some tips on defining True and False in boolean tests:
Compare boolean values to True and False implicitly.
In C, use the 1==1 trick to define TRUE and FALSE.
Making Complicated Expressions Simple
Break complicated tests into partial tests with new boolean variables.
Move complicated expressions into boolean functions.
Use decision tables to replace complicated conditions.
Forming Boolean Expressions Positively
In if statements, convert negatives to positives and flip-flop the code in the if and else clauses.
Apply DeMorgan’s Theorems to simplify boolean tests with negatives.
Guidelines for Comparisons to 0
&& (and) is evaluated left to right.
Compare logical variables implicitly.
Compare numbers to 0.
Although it’s appropriate to compare logical expressions implicitly, you should compare numeric expressions explicitly. For numbers, write
while ( balance != 0 ) …
rather than
while ( balance ) …
Compare pointers to NULL.
For pointers, write
while ( bufferPtr != NULL ) …
rather than
while ( bufferPtr ) …
In C and C++, put constants on the left side of comparisons.
In Java, know the difference between a==b and a.equals(b).
Null Statements
Call attention to null statements.
Create a preprocessor null() macro or inline function for null statements.
Consider whether the code would be clearer with a non-null loop body.
Taming Dangerously Deep Nesting
Simplify a nested if by retesting part of the condition.
Simplify a nested if by using a break block.
Convert a nested if to a set of if-then-elses.
Convert a nested if to a case statement.
Factor deeply nested code into its own routine.
Redesign deeply nested code.
Summary of Techniques for Reducing Deep Nesting
- Retest part of the condition (this section)
- Convert to if-then-elses (this section)
- Convert to a case statement (this section)
- Factor deeply nested code into its own routine (this section)
- Use objects and polymorphic dispatch (this section)
- Rewrite the code to use a status variable (in Section 17.3.)
- Use guard clauses to exit a routine and make the nominal path through the code clearer (in Section 17.1.)
- Use exceptions (Section 8.4)
- Redesign deeply nested code entirely (this section)
The Three Components of Structured Programming
Sequence
A sequence is a set of statements executed in order.
Typical sequential statements include assignments and calls to routines.
Selection
A selection is a control structure that causes statements to be executed selectively.
The if-then-else statement is a common example.
Either the if-then clause or the else clause is executed, but not both.
Iteration
An iteration is a control structure that causes a group of statements to be executed multiple times.
An iteration is commonly referred to as a “loop.”
CHECKLIST: Control-Structure Issues
- Do expressions use True and False rather than 1 and 0?
- Are boolean values compared to True and False implicitly?
- Are numeric values compared to their test values explicitly?
- Have expressions been simplified by the addition of new boolean variables and the use of boolean functions and decision tables?
- Are boolean expressions stated positively?
- Do pairs of braces balance?
- Are braces used everywhere they’re needed for clarity?
- Are logical expressions fully parenthesized?
- Have tests been written in number-line order?
- Do Java tests uses a.equals(b) style instead of a == b when appropriate? Are null statements obvious?
- Have nested statements been simplified by retesting part of the conditional, converting to if-then-else or case statements, moving nested code into its own routine, converting to a more object-oriented design, or improved in some other way?
- If a routine has a decision count of more than 10, is there a good reason for not redesigning it?
The Software-Quality Landscape
External characteristics are characteristics that a user of the software product is aware of:
Correctness. The degree to which a system is free from faults in its specification, design, and implementation.
Usability. The ease with which users can learn and use a system.
Efficiency. Minimal use of system resources, including memory and execution time.
Reliability. The ability of a system to perform its required functions under stated conditions whenever required—having a long mean time between failures.
Integrity. The degree to which a system prevents unauthorized or improper access to its programs and its data. The idea of integrity includes restricting unauthorized user accesses as well as ensuring that data is accessed properly—that is, that tables with parallel data are modified in parallel, that date fields contain only valid dates, and so on.
Adaptability. The extent to which a system can be used, without modification, in applications or environments other than those for which it was specifically designed.
Accuracy. The degree to which a system, as built, is free from error, especially with respect to quantitative outputs. Accuracy differs from correctness; it is a determination of how well a system does the job it’s built for rather than whether it was built correctly.
Robustness. The degree to which a system continues to function in the presence of invalid inputs or stressful environmental conditions.
External characteristics of quality are the only kind of software characteristics that users care about.
Maintainability. The ease with which you can modify a software system to change or add capabilities, improve performance, or correct defects.
Flexibility. The extent to which you can modify a system for uses or environments other than those for which it was specifically designed.
Portability. The ease with which you can modify a system to operate in an environment different from that for which it was specifically designed.
Reusability. The extent to which and the ease with which you can use parts of a system in other systems.
Readability. The ease with which you can read and understand the source code of a system, especially at the detailed-statement level.
Testability. The degree to which you can unit-test and system-test a system; the degree to which you can verify that the system meets its requirements.
Understandability. The ease with which you can comprehend a system at both the system-organizational and detailed-statement levels. Understandability has to do with the coherence of the system at a more general level than readability does.
Techniques for Improving Software Quality
Software-quality objectives.
Explicit quality-assurance activity.
Testing strategy.
Software-engineering guidelines.
Informal technical reviews.
Formal technical reviews.
External audits.
Development process.
Change-control procedures.
Measurement of results.
Prototyping.
Cost of Defects
Recommended combination:
- Formal design inspections of the critical parts of a system
- Modeling or prototyping using a rapid prototyping technique
- Code reading or inspections
- Execution testing
CHECKLIST: A Quality-Assurance Plan
- Have you identified specific quality characteristics that are important to your project?
- Have you made others aware of the project’s quality objectives?
- Have you differentiated between external and internal quality
characteristics? - Have you thought about the ways in which some characteristics may compete with or complement others?
- Does your project call for the use of several different error-detection techniques suited to finding several different kinds of errors?
- Does your project include a plan to take steps to assure software quality during each stage of software development?
- Is the quality measured in some way so that you can tell whether it’s improving or degrading?
- Does management understand that quality assurance incurs additional costs up front in order to save costs later?
Collaborative Construction
With collective ownership, all code is owned by the group rather than by individuals and can be modified by various members of the group.
This produces several valuable benefits:
Better code quality arises from multiple sets of eyes seeing the code and multiple programmers working on the code.
The risk of someone leaving the project is lower because multiple people are familiar with each section of code.
Defect-correction cycles are shorter overall because any of several programmers can potentially be assigned to fix bugs on an as-available basis.
Pair Programming
Keys to Success with Pair Programming
Support pair programming with coding standards.
Don’t let pair programming turn into watching.
Don’t force pair programming of the easy stuff.
Rotate pairs and work assignments regularly.
Encourage pairs to match each other’s pace.
Make sure both partners can see the monitor.
Don’t force people who don’t like each other to pair.
Avoid pairing all newbies.
Assign a team leader.
CHECKLIST: Effective Pair Programming
- Do you have a coding standard to support pair programming that’s focused on programming rather than on philosophical coding-style discussions?
- Are both partners participating actively?
- Are you avoiding pair programming everything, instead selecting the
assignments that will really benefit from pair programming? - Are you rotating pair assignments and work assignments regularly?
- Are the pairs well matched in terms of pace and personality?
- Is there a team leader to act as the focal point for management and other people outside the project?
Formal Inspections
An inspection differs from a run-of- the-mill review in several key ways:
- Checklists focus the reviewers’ attention on areas that have been problems in the past.
- The emphasis is on defect detection, not correction.
- Reviewers prepare for the inspection meeting beforehand and arrive with a list of the problems they’ve discovered.
- Distinct roles are assigned to all participants.
- The moderator of the inspection isn’t the author of the work product under inspection.
- The moderator has received specific training in moderating inspections.
- Data is collected at each inspection and is fed into future inspections to improve them.
- General management doesn’t attend the inspection meeting. Technical leaders might.
Roles During an Inspection
Moderator.
The moderator is responsible for keeping the inspection moving at a rate that’s fast enough to be productive but slow enough to find the most errors possible.
Author.
Otherwise, the author’s duties are to explain parts of the design or code that are unclear and, occasionally, to explain why things that seem like errors are actually acceptable.
Reviewer.
A reviewer of a design might be the programmer who will implement the design.
A tester or higher-level architect might also be involved.
The role of the reviewers is to find defects.
Scribe.
The scribe records errors that are detected and the assignments of action items during the inspection meeting.
Management.
Not usually a good idea.
The point of a software inspection is that it is a purely technical review.
Management’s presence changes the focus from technical to political.
Management has a right to know the results of an inspection, and an inspection report is prepared to keep management informed.
Evaluation of performance should be based on final products, not on work that isn’t finished.
Overall, an inspection should have no fewer than three participants.
Having more than two to three reviewers doesn’t appear to increase the number of defects found.
General Procedure for an Inspection
An inspection consists of several distinct stages:
Planning.
The author gives the design or code to the moderator.
The moderator decides who will review the material and when and where the inspection meeting will occur.
Distributes the design or code and a checklist that focuses the attention of the inspectors.
Overview.
The design or code should speak for itself; the overview shouldn’t speak for it.
Preparation.
Each reviewer works alone for about 90 minutes to become familiar with the design or code.
The reviewers use the checklist to stimulate and direct their examination of the review materials.
Perspectives.
A reviewer might be asked to inspect the design or code from the point of view of the maintenance programmer, the customer, or the designer, for example.
Scenarios.
A scenario might also involve a specific task that a reviewer is assigned to perform, such as listing the specific requirements that a particular design element satisfies.
Inspection Meeting.
Don’t discuss solutions during the meeting.
The group should stay focused on identifying defects.
The meeting generally should not last more than two hours.
Inspection Report
Within a day of the inspection meeting, the moderator produces an inspection report (email, or equivalent) that lists each defect, including its type and severity.
If you collect data on the time spent and the number of errors found over time, you can respond to challenges about inspection’s efficacy with hard data.
Data collection is also important because any new methodology needs to justify its existence.
Rework.
The moderator assigns defects to someone, usually the author, for repair.
Follow-Up.
The moderator is responsible for seeing that all rework assigned during the inspection is carried out.
If more than 5 percent of the design or code needs to be reworked, the whole inspection process should be repeated.
Third-Hour Meeting.
Even though during the inspection participants aren’t allowed to discuss solutions to the problems raised, some might still want to.
You can hold an informal, third-hour meeting to allow interested parties to discuss solutions after the official inspection is over.
Fine-Tuning the Inspection.
CHECKLIST: Effective Inspections
- Do you have checklists that focus reviewer attention on areas that have been problems in the past?
- Is the emphasis on defect detection rather than correction?
- Are inspectors given enough time to prepare before the inspection meeting, and is each one prepared?
- Does each participant have a distinct role to play?
- Does the meeting move at a productive rate?
- Is the meeting limited to two hours?
- Has the moderator received specific training in conducting inspections?
- Is data about error types collected at each inspection so that you can tailor future checklists to your organization?
- Is data about preparation and inspection rates collected so that you can optimize future preparation and inspections?
- Are the action items assigned at each inspection followed up, either personally by the moderator or with a re-inspection?
- Does management understand that it should not attend inspection meetings?
Other Kinds of Collaborative Development Practices
Walkthroughs
The term is loosely defined, and at least some of its popularity can be attributed to the fact that people can call virtually any kind of review a “walkthrough.”
- The walkthrough is usually hosted and moderated by the author of the design or code under review.
- The walkthrough focuses on technical issues; it’s a working meeting.
- All participants prepare for the walkthrough by reading the design or code and looking for errors.
- The walkthrough is a chance for senior programmers to pass on experience and corporate culture to junior programmers. It’s also a chance for junior programmers to present new methodologies and to challenge timeworn, possibly obsolete, assumptions.
- A walkthrough usually lasts 30 to 60 minutes.
- The emphasis is on error detection, not correction.
- Management doesn’t attend.
- The walkthrough concept is flexible and can be adapted to the specific needs of the organization using it.
A review is basically a meeting, and meetings are expensive.
If the work product you’re reviewing doesn’t justify the overhead of a formal inspection, it doesn’t justify the overhead of a meeting at all.
You’re better off using document reading or another less interactive approach.
Code Reading
You also comment on qualitative aspects of the code such as its design, style, readability, maintainability, and efficiency.
A code reading usually involves two or more people reading code independently and then meeting with the author of the code to discuss it.
- In preparation for the meeting, the author of the code hands out source listings to the code readers. - The listings are from 1000 to 10,000 lines of code; 4000 lines is typical.
- Two or more people read the code. Use at least two people to encourage competition between the reviewers. If you use more than two, measure everyone’s contribution so that you know how much the extra people contribute.
- Reviewers read the code independently. Estimate a rate of about 1000 lines a day.
- When the reviewers have finished reading the code, the code-reading meeting is hosted by the author of the code. The meeting lasts one or two hours and focuses on problems discovered by the code readers. No one makes any attempt to walk through the code line by line. The meeting is not even strictly necessary.
- The author of the code fixes the problems identified by the reviewers.
The difference between code reading on the one hand and inspections and walkthroughs on the other is that code reading focuses more on individual review of the code than on the meeting.
Dog-and-Pony Shows
Dog-and-pony shows are reviews in which a software product is demonstrated to a customer.
Preparing for them might have an indirect effect on technical quality, but usually more time is spent in making good-looking Microsoft Powerpoint slides than in improving the quality of the software.
Developer Testing
Testing is a hard activity for most developers to swallow for several reasons:
Testing’s goal runs counter to the goals of other development activities.
The goal is to find errors.
A successful test is one that breaks the software.
The goal of every other development activity is to prevent errors and keep the software from breaking.
Testing can never completely prove the absence of errors.
An absence of errors could mean ineffective or incomplete test cases as easily as it could mean perfect software.
Testing by itself does not improve software quality.
Test results are an indicator of quality, but in and of themselves, they don’t improve it.
If you want to improve your software, don’t just test more; develop better.
Testing requires you to assume that you’ll find errors in your code.
If you assume you won’t, you probably won’t, but only because you’ll have set up a self-fulfilling prophecy.
If you execute the program hoping that it won’t have any errors, it will be too easy to overlook the errors you find.
The main source of undetected errors was that erroneous output was not examined carefully enough. The errors were visible but the programmers didn’t notice them (Myers 1978).
Depending on the project’s size and complexity, developer testing should probably take 8 to 25% of the total project time.
During construction you generally write a routine or class, check it mentally, and then review it or test it.
Regardless of your integration or system-testing strategy, you should test each unit thoroughly before you combine it with any others.
Recommended Approach to Developer Testing
- Test for each relevant requirement to make sure that the requirements have been implemented.
Plan the test cases for this step at the requirements stage or as early as possible—preferably before you begin writing the unit to be tested.
Consider testing for common omissions in requirements.
The level of security, storage, the installation procedure, and system reliability are all fair game for testing and are often overlooked at requirements time.
- Test for each relevant design concern to make sure that the design has been implemented.
Plan the test cases for this step at the design stage or as early as possible—before you begin the detailed coding of the routine or class to be tested.
- Use “basis testing” to add detailed test cases to those that test the requirements and the design.
Add data-flow tests, and then add the remaining test cases needed to thoroughly exercise the code.
At a minimum, you should test every line of code.
Test First or Test Last?
Reasons to write test cases first:
Writing test cases before writing the code doesn’t take any more effort than writing test cases after the code; it simply resequences the test-case-writing activity.
When you write test cases first, you detect defects earlier and you can correct them more easily.
Writing test cases first forces you to think at least a little bit about the requirements and design before writing code, which tends to produce better code.
Writing test cases first exposes requirements problems sooner, before the code is written, because it’s hard to write a test case for a poor requirement.
If you save your test cases (which you should), you can still test last, in addition to testing first.
Limitations of Developer Testing
Developer tests tend to be “clean tests”.
Code works (clean tests); code breaks (dirty tests).
Mature testing organizations tend to have five dirty tests for every clean test.
This ratio is not reversed by reducing the clean tests; it’s done by creating 25 times as many dirty tests.
Developer testing tends to have an optimistic view of test coverage.
Developer testing tends to skip more sophisticated kinds of test coverage.
A better coverage standard is to meet what’s called “100% branch coverage,” with every predicate term being tested for at least one true and one false value.
Bag of Testing Tricks
When you’re planning tests, eliminate those that don’t tell you anything new— that is, tests on new data that probably won’t produce an error if other, similar data didn’t produce an error.
Structured basis testing is a fairly simple concept.
The idea is that you need to test each statement in a program at least once.
Determining the Number of Test Cases Needed for Structured Basis Testing.
- Start with 1 for the straight path through the routine.
- Add 1 for each of the following keywords, or their equivalents: if, while, repeat, for, and, and or.
- Add 1 for each case in a case statement. If the case statement doesn’t have a default case, add 1 more.
Data-Flow Testing
Data can exist in one of three states:
Defined
The data has been initialized, but it hasn’t been used yet.
Used
The data has been used for computation, as an argument to a routine, or for something else.
Killed
The data was once defined, but it has been undefined in some way.
For example, if the data is a pointer, perhaps the pointer has been freed.
If it’s a for-loop index, perhaps the program is out of the loop and the programming language doesn’t define the value of a for-loop index once it’s outside the loop.
If it’s a pointer to a record in a file, maybe the file has been closed and the record pointer is no longer valid.
In addition to having the terms “defined,” “used,” and “killed,” it’s convenient to have terms that describe entering or exiting a routine immediately before or after doing something to a variable:
Entered
The control flow enters the routine immediately before the variable is acted upon.
A working variable is initialized at the top of a routine, for example.
Exited
The control flow leaves the routine immediately after the variable is acted upon.
A return value is assigned to a status variable at the end of a routine, for example.
View the following patterns suspiciously:
Defined-Killed
Defining a variable and then killing it suggests either that the variable is extraneous or that the code that was supposed to use the variable is missing.
Entered-Killed
This is a problem if the variable is a local variable.
It wouldn’t need to be killed if it hasn’t been defined or used.
If, on the other hand, it’s a routine parameter or a global variable, this pattern is all right as long as the variable is defined somewhere else before it’s killed.
Killed-Used
Using a variable after it has been killed is a logical error.
If the code seems to work anyway (for example, a pointer that still points to memory that’s been freed), that’s an accident, and Murphy’s Law says that the code will stop working at the time when it will cause the most mayhem.
Used-Defined
Using and then defining a variable might or might not be a problem, depending on whether the variable was also defined before it was used.
Certainly if you see a used-defined pattern, it’s worthwhile to check for a previous definition.
Check for these anomalous sequences of data states before testing begins.
After you’ve checked for the anomalous sequences, the key to writing data-flow test cases is to exercise all possible defined-used paths.
You can do this to various degrees of thoroughness, including
All definitions.
Test every definition of every variable (that is, every place at which any variable receives a value).
This is a weak strategy because if you try to exercise every line of code you’ll do this by default.
All defined-used combinations.
Test every combination of defining a variable in one place and using it in another.
This is a stronger strategy than testing all definitions because merely executing every line of code does not guarantee that every defined-used combination will be tested.
Error Guessing
The term “error guessing” is a lowbrow name for a sensible concept.
It means creating test cases based upon guesses about where the program might have errors, although it implies a certain amount of sophistication in the guessing.
When you keep records of the kinds of errors you’ve made before, you improve the likelihood that your “error guess” will discover an error.
Boundary Analysis
Boundary analysis also applies to minimum and maximum allowable values.
Compound Boundaries
Classes of Bad Data
Typical bad-data test cases include
- Too little data (or no data)
- Too much data
- The wrong kind of data (invalid data)
- The wrong size of data
- Uninitialized data
Classes of Good Data
Here are other kinds of good data that are worth checking:
- Nominal cases—middle-of-the-road, expected values
- Minimum normal configuration
- Maximum normal configuration
- Compatibility with old data
The minimum normal configuration is useful for testing not just one item, but a group of items.
It’s similar in spirit to the boundary condition of many minimal values, but it’s different in that it creates the set of minimum values out of the set of what is normally expected.
Typical Errors
The scope of most errors is fairly limited.
Many errors are outside the domain of construction.
Most construction errors are the programmers’ fault.
Clerical errors (typos) are a surprisingly common source of problems.
Misunderstanding the design is a recurring theme in studies of programmer errors.
Most errors are easy to fix.
It’s a good idea to measure your own organization’s experiences with errors.
On small projects, construction defects make up the vast bulk of all errors.
In one study of coding errors on a small project (1000 lines of code), 75% of defects resulted from coding, compared to 10% from requirements and 15% from design (Jones 1986a).
This error breakdown appears to be representative of many small projects.
Construction defects account for at least 35% of all defects.
Although the proportion of construction defects is smaller on large projects, they still account for at least 35% of all defects (Beizer 1990, Jones 2000).
Some researchers have reported proportions in the 75% range even on very large projects (Grady 1987).
In general, the better the application area is understood, the better the overall architecture is.
Errors then tend to be concentrated in detailed design and coding (Basili and Perricone 1984).
Construction errors, though cheaper to fix than requirements and design errors, are still expensive.
When the greater number of construction defects was figured into the overall equation, the total cost to fix construction defects was one to two times as much as the cost attributed to design defects.
The General Principle of Software Quality:
It’s cheaper to build high-quality software than it is to build and fix low-quality software.
The error’s in the test data!
How idiotic it feels to waste hours tracking down an error in the test data rather than in the code!
Plug unit tests into a test framework.
Write code for unit tests first, but integrate them into a system-wide test framework (like JUnit) as you complete each test.
Building Scaffolding to Test Individual Classes
One kind of scaffolding is a class that’s dummied up so that it can be used by another class that’s being tested.
Such a class is called a “mock object” or “stub object”.
It can:
- Return control immediately, having taken no action Test the data fed to it
- Print a diagnostic message, perhaps an echo of the input parameters, or log a message to a file
- Get return values from interactive input
- Return a standard answer regardless of the input
- Burn up the number of clock cycles allocated to the real object or routine
- Function as a slow, fat, simple, or less accurate version of the real object or routine.
Another kind of scaffolding is a fake routine that calls the real routine being tested.
This is called a “driver” or, sometimes, a “test harness.”
It can:
- Call the object with a fixed set of inputs
- Prompt for input interactively and call the object with it
- Take arguments from the command line (in operating systems that support it) and call the object
- Read arguments from a file and call the object
- Run through predefined sets of input data in multiple calls to the object
Diff Tools
Test-Data Generators
You can also write code to exercise selected pieces of a program systematically.
System Perturbers
Another class of test-support tools are designed to perturb a system.
Many people have stories of programs that work 99 times out of 100 but fail on the hundredth run-through with the same data.
The problem is nearly always a failure to initialize a variable somewhere, and it’s usually hard to reproduce because 99 times out of 100 the uninitialized variable happens to be 0.
This class includes tools that have a variety of capabilities:
Memory filling.
You want to be sure you don’t have any uninitialized variables.
Some tools fill memory with arbitrary values before you run your program so that uninitialized variables aren’t set to 0 accidentally. In some cases, the memory may be set to a specific value.
Memory shaking.
In multi-tasking systems, some tools can rearrange memory as your program operates so that you can be sure you haven’t written any code that depends on data being in absolute rather than relative locations.
Selective memory failing.
A memory driver can simulate low-memory conditions in which a program might be running out of memory, fail on a memory request, grant an arbitrary number of memory requests before failing, or fail on an arbitrary number of requests before granting one.
This is especially useful for testing complicated programs that work with dynamically allocated memory.
Memory-access checking (bounds checking).
Bounds checkers watch pointer operations to make sure your pointers behave themselves.
Such a tool is useful for detecting uninitialized or dangling pointers.
Error Databases
One powerful test tool is a database of errors that have been reported.
Such a database is both a management and a technical tool.
It allows you to check for recurring errors, track the rate at which new errors are being detected and corrected, and track the status of open and closed errors and their severity.
Planning to Test
One key to effective testing is planning from the beginning of the project to test.
Test planning is also an element of making the testing process repeatable.
If you can’t repeat it, you can’t improve it.
Retesting (Regression Testing)
Suppose that you’ve tested a product thoroughly and found no errors.
Suppose that the product is then changed in one area and you want to be sure that it still passes all the tests it did before the change—that the change didn’t introduce any new defects.
Testing designed to make sure the software hasn’t taken a step backwards, or “regressed,” is called “regression testing.”
If you run different tests after each change, you have no way of knowing for sure that no new defects have been introduced.
Consequently, regression testing must run the same tests each time.
Sometimes new tests are added as the product matures, but the old tests are kept too.
Automated Testing
The only practical way to manage regression testing is to automate it.
The main tools used to support automatic testing provide test scaffolding, generate input, capture output, and compare actual output with expected output.
Keeping Test Records
Here are a few kinds of data you can collect to measure your project:
- Administrative description of the defect (the date reported, the person who reported it, a title or description, the date fixed)
- Full description of the problem
- Steps to take to repeat the problem
- Suggested workaround for the problem
- Related defects
- Severity of the problem—for example, fatal, bothersome, or cosmetic
- Origin of the defect—requirements, design, coding, or testing
- Subclassification of a coding defect—off-by-one, bad assignment, bad array index, bad routine call, and so on
- Location of the fix for the defect
- Classes and routines changed by the fix
- Person responsible for the defect (this can be controversial and might be bad for morale)
- Lines of code affected by the defect
- Hours to find the defect
- Hours to fix the defect
Once you collect the data, you can crunch a few numbers to determine whether your project is getting sicker or healthier:
- Number of defects in each class, sorted from worst class to best
- Number of defects in each routine, sorted from worst routine to best
- Average number of testing hours per defect found
- Average number of defects found per test case
- Average number of programming hours per defect fixed
- Percentage of code covered by test cases
- Number of outstanding defects in each severity classification
Personal Test Records
In addition to project-level test records, you might find it useful to keep track of your personal test records.
These records can include both a checklist of the errors you most commonly make as well as a record of the amount of time you spend writing code, testing code, and correcting errors.
CHECKLIST: Test Cases
- Does each requirement that applies to the class or routine have its own test case?
- Does each element from the design that applies to the class or routine have its own test case?
- Has each line of code been tested with at least one test case? Has this been verified by computing the minimum number of tests necessary to exercise each line of code?
- Have all defined-used data-flow paths been tested with at least one test case?
- Has the code been checked for data-flow patterns that are unlikely to be
correct, such as defined-defined, defined-exited, and defined-killed? - Has a list of common errors been used to write test cases to detect errors that have occurred frequently in the past?
- Have all simple boundaries been tested—maximum, minimum, and off-by- one boundaries?
- Have compound boundaries been tested—that is, combinations of input data that might result in a computed variable that’s too small or too large?
- Do test cases check for the wrong kind of data—for example, a negative number of employees in a payroll program?
- Are representative, middle-of-the-road values tested?
- Is the minimum normal configuration tested?
- Is the maximum normal configuration tested?
- Is compatibility with old data tested? And are old hardware, old versions of the operating system, and interfaces with old versions of other software tested?
- Do the test cases make hand-checks easy?
Debugging
Most of the defects you will have will be minor oversights and typos, easily found by looking at a source-code listing or stepping through the code in a debugger.
Defects as Opportunities
Defects can:
Learn about the program you’re working on.
Learn about the kind of mistakes you make.
Learn about the quality of your code from the point of view of someone who has to read it.
Learn about how you solve problems.
Do you guess randomly?
Do you need to improve?
Taking time to analyze and change the way you debug might be the quickest way to decrease the total amount of time it takes you to develop a program.
Learn about how you fix defects.
Do you make the easiest possible correction, by applying goto Band-Aids and special-case makeup that changes the symptom but not the problem?
Or do you make systemic corrections, demanding an accurate diagnosis and prescribing treatment for the heart of the problem?
The Devil’s Guide to Debugging
他是在说反话对吧.
Find the defect by guessing.
这不是我在136时做的(尼玛).
Don’t waste time trying to understand the problem.
Fix the error with the most obvious fix.
Don’t.

The Scientific Method of Debugging
Here are the steps you go through when you use the scientific method:
- Gather data through repeatable experiments.
- Form a hypothesis that accounts for the relevant data.
- Design an experiment to prove or disprove the hypothesis.
- Prove or disprove the hypothesis.
- Repeat as needed.
This process has many parallels in debugging.
Here’s an effective approach for finding a defect:
- Stabilize the error.
- Locate the source of the error (the “fault”).
a. Gather the data that produces the defect.
b. Analyze the data that has been gathered and form a hypothesis about the defect.
c. Determine how to prove or disprove the hypothesis, either by testing the program or by examining the code.
d. Prove or disprove the hypothesis using the procedure identified in 2(c). - Fix the defect.
- Test the fix.
- Look for similar errors.
Stabilize the Error
An error that doesn’t occur predictably is usually an initialization error or a dangling-pointer problem.
If the calculation of a sum is right sometimes and wrong sometimes, a variable involved in the calculation probably isn’t being initialized properly.
If the problem is a strange and unpredictable phenomenon and you’re using pointers, you almost certainly have an uninitialized pointer or are using a pointer after the memory that it points to has been deallocated.
Stabilizing an error usually requires more than finding a test case that produces the error.
It includes narrowing the test case to the simplest one that still produces the error.
Tips for Finding Defects
Use all the data available to make your hypothesis.
Refine the test cases that produce the error.
Exercise the code in your unit test suite.
Use available tools.
Numerous tools are available to support debugging sessions: interactive debuggers, picky compilers, memory checkers, and so on.
The right tool can make a difficult job easy.
Reproduce the error several different ways.
Generate more data to generate more hypotheses.
Use the results of negative tests.
Brainstorm for possible hypotheses.
Narrow the suspicious region of the code.
Be suspicious of classes and routines that have had defects before.
Check code that’s changed recently.
Expand the suspicious region of the code.
Integrate incrementally.
Check for common defects.
Talk to someone else about the problem.
Take a break from the problem.
Brute Force Debugging
By “brute force,” I’m referring to a technique that might be tedious, arduous, and time-consuming, but that it is guaranteed to solve the problem.
Here are some general candidates:
- Perform a full design and/or code review on the broken code
- Throw away the section of code and redesign/recode it from scratch
- Throw away the whole program and redesign/recode it from scratch
- Compile code with full debugging information
- Compile code at pickiest warning level and fix all the picky compiler warnings
- Strap on a unit test harness and test the new code in isolation
- Create an automated test suite and run it all night
- Step through a big loop in the debugger manually until you get to the error condition
- Instrument the code with print, display, or other logging statements
- Replicate the end-user’s full machine configuration
- Integrate new code in small pieces, fully testing each piece as its integrated
Set a maximum time for quick and dirty debugging.
Make a list of brute force techniques.
Syntax Errors
Don’t trust line numbers in compiler messages.
Don’t trust compiler messages.
Don’t trust the compiler’s second message.
Divide and conquer.
Find extra comments and quotation marks.
Fixing a Defect
Understand the problem before you fix it.
Understand the program, not just the problem.
Confirm the defect diagnosis.
Relax.
Save the original source code.
Fix the problem, not the symptom.
Change the code only for good reason.
Make one change at a time.
Check your fix.
Look for similar defects.
Debugging Tools—Obvious and Not-So- Obvious
Diff.
Compiler Warning Messages.
Set your compiler’s warning level to the highest, pickiest level possible and fix the code so that it doesn’t produce any compiler warnings.
Treat warnings as errors.
Initiate project wide standards for compile-time settings.
Extended Syntax and Logic Checking.
Execution Profiler.
Test Frameworks/Scaffolding.
Debugger.
CHECKLIST: Debugging Reminders
Techniques for Finding Defects
- Use all the data available to make your hypothesis
- Refine the test cases that produce the error
- Exercise the code in your unit test suite
- Use available tools
- Reproduce the error several different ways
- Generate more data to generate more hypotheses
- Use the results of negative tests
- Brainstorm for possible hypotheses
- Narrow the suspicious region of the code
- Be suspicious of classes and routines that have had defects before
- Check code that’s changed recently
- Expand the suspicious region of the code
- Integrate incrementally
- Check for common defects
- Talk to someone else about the problem
- Take a break from the problem
- Set a maximum time for quick and dirty debugging
- Make a list of brute force techniques, and use them
Techniques for Syntax Errors
- Don’t trust line numbers in compiler messages
- Don’t trust compiler messages
- Don’t trust the compiler’s second message
- Divide and conquer
- Find extra comments and quotation marks
Techniques for Fixing Defects - Understand the problem before you fix it
- Understand the program, not just the problem
- Confirm the defect diagnosis
- Relax
- Save the original source code
- Fix the problem, not the symptom
- Change the code only for good reason
- Make one change at a time
- Check your fix
- Look for similar defects
General Approach to Debugging
- Do you use debugging as an opportunity to learn more about your program, mistakes, code quality, and problem-solving approach?
- Do you avoid the trial-and-error, superstitious approach to debugging?
- Do you assume that errors are your fault?
- Do you use the scientific method to stabilize intermittent errors?
- Do you use the scientific method to find defects?
- Rather than using the same approach every time, do you use several different techniques to find defects?
- Do you verify that the fix is correct?
- Do you use compiler warning messages, execution profiling, a test
framework, scaffolding, and interactive debugging?
Code-Tuning Strategies
Quality Characteristics and Performance
Users are more interested in tangible program characteristics than they are in code quality.
Users tend to be more interested in program throughput than raw performance.
Delivering software on time, providing a clean user interface, and avoiding downtime are often more significant.
Once you’ve chosen efficiency as a priority, whether its emphasis is on speed or on size, you should consider several options before choosing to improve either speed or size at the code level.
Think about efficiency from each of these view- points:
- Program requirements
- System design
- Class and routine design
- Operating-system interactions
- Code compilation
- Hardware
- Code tuning
Old Wives’ Tales
Some common misapprehensions:
Reducing the lines of code in a high-level language improves the speed or size of the resulting machine code—false!
Certain operations are probably faster or smaller than others—false!
There’s no room for “probably” when you’re talking about performance.
You must always measure performance to know whether your changes helped or hurt your program.
You should optimize as you go—false!
It’s almost impossible to identify performance bottlenecks before a program is working completely.
Focusing on optimization during initial development detracts from achieving other program objectives.
A fast program is just as important as a correct one—false!
When to Tune
Use a high-quality design.
Make the program right.
Make it modular and easily modifiable so that it’s easy to work on later.
When it’s complete and correct, check the performance.
Several common sources of inefficiency:
Input/output operations.
One of the most significant sources of inefficiency is unnecessary I/O.
System calls.
Calls to system routines are often expensive.
System routines include in- put/output operations to disk, keyboard, screen, printer, or other device; mem- ory-management routines; and certain utility routines.
Interpreted languages.
Errors.
Measurement
Whether you use someone else’s tool or write your own code to make the meas- urements, make sure that you’re measuring only the execution time of the code you’re tuning.
Use the number of CPU clock ticks allocated to your program rather than the time of day.
Otherwise, when the system switches from your pro- gram to another program, one of your routines will be penalized for the time spent executing another program.
Summary of the Approach to Code Tuning
Here are the steps you should take as you consider whether code tuning can help improve the performance of a program:
- Develop the software using well-designed code that’s easy to understand and modify.
- If performance is poor,
a. Save a working version of the code so that you can get back to the “last known good state.”
b. Measure the system to find hot spots.
c. Determine whether the weak performance comes from inadequate de- sign, data types, or algorithms and whether code tuning is appropriate. If code tuning isn’t appropriate, go back to step 1.
d. Tune the bottleneck identified in step (c).
e. Measure each improvement one at a time.
f. If an improvement doesn’t improve the code, revert to the code saved in step (a). (Typically, more than half the attempted tunings will produce only a negligible improvement in performance or degrade performance.) - Repeat from step 2.
CHECKLIST: Code-Tuning Strategy
Overall Program Performance
- Have you considered improving performance by changing the program re- quirements?
- Have you considered improving performance by modifying the program’s design?
- Have you considered improving performance by modifying the class design?
- Have you considered improving performance by avoiding operating system interactions?
- Have you considered improving performance by avoiding I/O?
- Have you considered improving performance by using a compiled language
instead of an interpreted language? - Have you considered improving performance by using compiler optimiza- tions?
- Have you considered improving performance by switching to different hardware?
- Have you considered code tuning only as a last resort?
Code-Tuning Approach
- Is your program fully correct before you begin code tuning?
- Have you measured performance bottlenecks before beginning code tuning?
- Have you measured the effect of each code-tuning change
- Have you backed out the code-tuning changes that didn’t produce the in- tended improvement?
- Have you tried more than one change to improve performance of each bot- tleneck, i.e., iterated?
Code-Tuning Techniques
Order Tests by Frequency
Arrange tests so that the one that’s fastest and most likely to be true is performed first.
It should be easy to drop through the normal case, and if there are inefficiencies, they should be in processing the uncommon cases.
This principle applies to case statements and to chains of if-then-elses.
Substitute Table Lookups for Complicated Expressions
In some circumstances, a table lookup may be quicker than traversing a complicated chain of logic.
Use Lazy Evaluation
If he waited long enough, he claimed, the things that weren’t important would be procrastinated into oblivion, and he wouldn’t waste his time doing them.
If a program uses lazy evaluation, it avoids doing any work until the work is needed.
Lazy evaluation is similar to just-in-time strategies that do the work closest to when it’s needed.
Make the loop itself faster
Unswitching
If the decision doesn’t change while the loop is executing, you can unswitch the loop by making the decision outside the loop.

Jamming
Jamming, or “fusion,” is the result of combining two loops that operate on the same set of elements.
Unrolling
The goal of loop unrolling is to reduce the amount of loop housekeeping.
Minimizing the Work Inside Loops
Sentinel Values
Putting the Busiest Loop on the Inside when have nested loop
Strength Reduction
Reducing strength means replacing an expensive operation such as multiplication with a cheaper operation such as addition.
Data Transformations
Use Integers Rather Than Floating-Point Numbers.
Use the Fewest Array Dimensions Possible.
Minimize Array References.
Use Supplementary Indexes.
String-Length Index.
Independent, Parallel Index Structure.
Use Caching.
Exploit Algebraic Identities.
Use Strength Reduction.
Replace multiplication with addition.
Replace exponentiation with multiplication.
Replace trigonometric routines with their trigonometric identities.
Replace longlong integers with longs or ints (but watch for performance issues associated with using native-length vs. non-native–length integers)
Replace floating-point numbers with fixed-point numbers or integers.
Replace double-precision floating points with single-precision numbers.
Replace integer multiplication-by-two and division-by-two with shift operations.
Initialize at Compile Time.
If you’re using a named constant or a magic number in a routine call and it’s the only argument, that’s a clue that you could precompute the number, put it into a constant, and avoid the routine call.
Be Wary of System Routines.
Use the Correct Type of Constants.
Precompute Results.
Optimizing a program by precomputation can take several forms:
- Computing results before the program executes and wiring them into constants that are assigned at compile time
- Computing results before the program executes and hard-coding them into variables used at run time
- Computing results before the program executes and putting them into a file that’s loaded at run time
- Computing results once, at program startup, and then referencing them each time they’re needed
- Computing as much as possible before a loop begins, minimizing the work done inside the loop
- Computing results the first time they’re needed and storing them so that you can retrieve them when they’re needed again
Eliminate Common Subexpressions.
Rewrite Routines In Line.
Recoding in Assembler.
CHECKLIST: Code-Tuning Techniques
Improve Both Speed and Size
- Substitute table lookups for complicated logic
- Jam loops
- Use integer instead of floating-point variables
- Initialize data at compile time
- Use constants of the correct type
- Precompute results
- Eliminate common subexpressions
- Translate key routines to assembler
Improve Speed Only
- Stop testing when you know the answer
- Order tests in case statements and if-then-else chains by frequency
- Compare performance of similar logic structures
- Use lazy evaluation
- Unswitch loops that contain if tests
- Unroll loops
- Minimize work performed inside loops
- Use sentinels in search loops
- Put the busiest loop on the inside of nested loops
- Reduce the strength of operations performed inside loops
- Change multiple-dimension arrays to a single dimension
- Minimize array references
- Augment data types with indexes
- Cache frequently used values
- Exploit algebraic identities
- Reduce strength in logical and mathematical expressions
- Be wary of system routines
- Rewrite routines in line
How Program Size Affects Construction
Communication and Size.
Range of Project Sizes.
Effect of Project Size on Errors.
Effect of Project Size on Productivity.
Effect of Project Size on Development Activities.
Activity Proportions and Size.
Here’s a list of activities that grow at a more-than-linear rate as project size increases:
- Communication
- Planning
- Management
- Requirements development
- System functional design
- Interface design and specification
- Architecture
- Integration
- Defect removal
- System testing
- Document production
Programs, Products, Systems, and System Products.
Methodology and Size.
Managing Construction
Considerations in Setting Standards
Here are several techniques for achieving good coding practices that are less heavy-handed than laying down rigid coding standards:
Assign two people to every part of the project.
Review every line of code.
Require code sign-offs.
Route good code examples for review.
Emphasize that code listings are public assets.
Although it is the result of their work, code is part of the project and should be freely available to anyone else on the project that needs it.
It should be seen by others during reviews and maintenance, even if at no other time.
Reward good code.
One easy standard.
What Is Configuration Management?
Configuration management is the practice of identifying project artifacts and handling changes systematically so that a system can maintain its integrity over time.
Another name for it is “change control.”
It includes techniques for evaluating proposed changes, tracking changes, and keeping copies of the system as it existed at various points in time.
Requirements and Design Changes
Follow a systematic change-control procedure.
Handle change requests in groups.
Estimate the cost of each change.
Be wary of high change volumes.
Establish a change-control board or its equivalent in a way that makes sense for your project.
Watch for bureaucracy, but don’t let the fear of bureaucracy preclude effective change control.
Software Code Changes
Version-control software.
Backup Plan.
CHECKLIST: Configuration Management
General
- Is your software-configuration-management plan designed to help programmers and minimize overhead?
- Does your SCM approach avoid overcontrolling the project?
- Do you group change requests, either through informal means such as a list of pending changes or through a more systematic approach such as a change-control board?
- Do you systematically estimate the effect of each proposed change?
- Do you view major changes as a warning that requirements development
isn’t yet complete?
Tools
- Do you use version-control software to facilitate configuration management?
- Do you use version-control software to reduce coordination problems of
working in teams?
Backup
- Do you back up all project materials periodically?
- Are project backups transferred to off-site storage periodically?
- Are all materials backed up, including source code, documents, graphics, and important notes?
- Have you tested the backup-recovery procedure?
Estimation Approaches
- Use scheduling software.
- Use an algorithmic approach, such as Cocomo II, Barry Boehm’s estimation model (Boehm et al 2000).
- Have outside estimation experts estimate the project.
- Have a walkthrough meeting for estimates.
- Estimate pieces of the project, and then add the pieces together.
- Have people estimate their own pieces, and then add the pieces together.
- Estimate the time needed for the whole project, and then divide up the time among the pieces.
- Refer to experience on previous projects.
- Keep previous estimates and see how accurate they were. Use them to adjust new estimates.
Here’s a good approach to estimating a project:
Establish objectives.
Allow time for the estimate, and plan it.
Spell out software requirements.
Estimate at a low level of detail.
Use several different estimation techniques, and compare the results.
Re-estimate periodically.
Here are some of the less easily quantified factors that can influence a software- development schedule.
- Requirements developer experience and capability
- Programmer experience and capability
- Team motivation
- Management quality
- Amount of code reused
- Personnel turnover
- Requirements volatility
- Quality of relationship with customer
- User participation in requirements
- Customer experience with the type of application
- Extent to which programmers participate in requirements development
- Classified security environment for computer, programs, and data
- Amount of documentation
- Project objectives (schedule vs. quality vs. usability vs. the many other possible objectives)
What to do If You’re Behind
Hope that you’ll catch up.
Expand the team.
Reduce the scope of the project.
Measuring
For any project attribute, it’s possible to measure that attribute in a way that’s superior to not measuring it at all.
To argue against measurement is to argue that it’s better not to know what’s really happening on your project.



Religious Issues
Here’s a list of religious issues:
- Programming language
- Indentation style
- Placing of braces
- Choice of IDE
- Commenting style
- Efficiency vs. readability trade-offs
- Choice of methodology—for example, scrum vs. extreme programming vs. evolutionary delivery
- Programming utilities
- Naming conventions
- Use of gotos
- Use of global variables
- Measurements, especially productivity measures such as lines of code per day
Managing Your Manager
Here are some approaches to dealing with your manager:
-Plant ideas for what you want to do, and then wait for your manager to have a brainstorm (your idea) about doing what you want to do.
-Educate your manager about the right way to do things. This is an ongoing job because managers are often promoted, transferred, or fired.
-Focus on your manager’s interests, doing what he or she really wants you to do, and don’t distract your manager with unnecessary implementation details. (Think of it as “encapsulation” of your job.)
-Refuse to do what your manager tells you, and insist on doing your job the right way.
-Find another job.
Integration
Here are some of the benefits you can expect from careful integration:
-Easier defect diagnosis
-Fewer defects
-Less scaffolding
-Shorter time to first working product
-Shorter overall development schedules
-Better customer relations
-Improved morale
-Improved chance of project completion
-More reliable schedule estimates
-More accurate status reporting
-Improved code quality
-Less documentation
Integration Frequency—Phased or Incremental?
Phased Integration
Until a few years ago, phased integration was the norm. It follows these well- defined steps:
- Design, code, test, and debug each class. This step is called “unit development.”
- Combine the classes into one whopping-big system. This is called “system integration.”
- Test and debug the whole system. This is called “system dis-integration.” (Thanks to Meilir Page-Jones for this witty observation.)
One problem with phased integration is that when the classes in a system are put together for the first time, new problems inevitably surface and the causes of the problems could be anywhere.
But in most cases, another approach is better.
Incremental Integration
In incremental integration, you write and test a program in small pieces and then combine the pieces one at a time. In this one-piece-at-a-time approach to integration, you follow these steps:
- Develop a small, functional part of the system.
It can be the smallest functional part, the hardest part, a key part, or some combination.
Thoroughly test and debug it.
It will serve as a skeleton on which to hang the muscles, nerves, and skin that make up the remaining parts of the system. - Design, code, test, and debug a class.
- Integrate the new class with the skeleton.
Test and debug the combination of skeleton and new class.
Make sure the combination works before you add any new classes.
If work remains to be done, repeat the process starting at step 2.
Top-Down Integration
In top-down integration, the class at the top of the hierarchy is written and integrated first.

An important aspect of top-down integration is that the interfaces between classes must be carefully specified.
The most troublesome errors to debug are not the ones that affect single classes but those that arise from subtle interactions between classes.
Bottom-Up Integration
Sandwich Integration
Risk-Oriented Integration
Risk-oriented integration is also called “hard part first” integration.
Coincidentally, it also tends to integrate the classes at the top and the bottom first, saving the middle-level classes for last. The motivation, however, is different.
Feature-Oriented Integration
Another approach is to integrate one feature at a time.
First, it eliminates scaffolding for virtually everything except low-level library classes.
The second main advantage is that each newly integrated feature brings about an incremental addition in functionality.
A third advantage is that feature-oriented integration works well with object- oriented design.
T-Shaped Integration
A final approach that often addresses the problems associated with top-down and bottom-up integration is called “T-Shaped Integration.”
Daily Build and Smoke Test
Whatever integration strategy you select, a good approach to integrating the software is the “daily build and smoke test.
Every file is compiled, linked, and combined into an executable program every day, and the program is then put through a “smoke test,” a relatively simple check to see whether the product “smokes” when it runs.
Here are some of the ins and outs of using daily builds.
Build daily.
Check for broken builds.
At a minimum, a “good” build should
-compile all files, libraries, and other components successfully
-link all files, libraries, and other components successfully
-not contain any showstopper bugs that prevent the program from being launched or that make it hazardous to operate
-pass the smoke test
Smoke test daily.
Automate the daily build and smoke test.
Establish a build group.
Add revisions to the build only when it makes sense to do so.
… but don’t wait too long to add a set of revisions.
Require developers to smoke test their code before adding it to the system.
Create a holding area for code that’s to be added to the build.
Create a penalty for breaking the build.
Release builds in the morning.
Build and smoke test even under pressure.
CHECKLIST: Integration
Integration Strategy
- Does the strategy identify the optimal order in which subsystems, classes, and routines should be integrated?
- Is the integration order coordinated with the construction order so that classes will be ready for integration at the right time?
- Does the strategy lead to easy diagnosis of defects?
- Does the strategy keep scaffolding to a minimum?
- Is the strategy better than other approaches?
- Have the interfaces between components been specified well? (Specifying interfaces isn’t an integration task, but verifying that they have been specified well is.)
Daily Build and Smoke Test
- Is the project building frequently—ideally, daily—to support incremental integration?
- Is a smoke test run with each build so that you know whether the build works?
- Have you automated the build and the smoke test?
- Do developers check in their code frequently—going no more than a day or two between check-ins?
- Is a broken build a rare occurrence?
- Do you build and smoke test the software even when you’re under pressure?
Programming Tools
Graphical design tools generally allow you to express a design in common graphical notations—UML, architecture block diagrams, hierarchy charts, entity relationship diagrams, or class diagrams.
Editing
This group of tools relates to editing source code.
Integrated Development Environments (IDEs).
In addition to basic word-processing functions, good IDEs offer these features:
-Compilation and error detection from within the editor
-Compressed or outline views of programs (class names only or logical structures without the contents)
-Jump to definitions of classes, routines, and variables
-Jump to all places where a class, routine, or variable is used
-Language-specific formatting
-Interactive help for the language being edited
-Brace (begin-end) matching
-Templates for common language constructs (the editor completing the structure of a for loop after the programmer types for, for example)
-Smart indenting (including easily changing the indentation of a block of statements when logic changes)
-Macros programmable in a familiar programming language
-Memory of search strings so that commonly used strings don’t need to be retyped
-Regular expressions in search-and-replace
-Search-and-replace across a group of files
-Editing multiple files simultaneously
-Multi-level undo
Multiple-File String Searching and Replacing
Diff Tools
Merge Tools
Source-Code Beautifiers
Interface Documentation Tools
Templates
Cross-Reference Tools
Class Hierarchy Generators
Analyzing Code Quality
Picky Syntax and Semantics Checkers
Metrics Reporters
Refactoring Source Code
Refactorers
Restructurers
Code Translators
Version Control
-Source-code control
-Make-style dependency control
-Project documentation versioning
Data Dictionaries
A data dictionary is a database that describes all the significant data in a project.
On large projects, a data dictionary is also useful for keeping track of the hundreds or thousands of class definitions.
Executable-Code Tools
Code Creation
- Compilers and Linkers
- Make
- Code Libraries
- Code Generation Wizards
- Setup and Installation
- Macro Preprocessors
Debugging
- Compiler warning messages
- Test scaffolding
- File comparators (for comparing different versions of source-code files)
- Execution profilers
- Trace monitors
- Interactive debuggers—both software and hardware. Testing tools, discussed next, are related to debugging tools.
Testing
- Automatic test frameworks like JUnit, NUnit, CppUnit and so on Automated test generators
- Test-case record and playback utilities
- Coverage monitors (logic analyzers and execution profilers) Symbolic debuggers
- System perturbers (memory fillers, memory shakers, selective memory failers, memory-access checkers)
- Diff tools (for comparing data files, captured output, and screen images)
- Scaffolding
- Defect tracking software
Code Tuning
- Execution Profilers
- Assembler Listings and Disassemblers
Checklist: Programming Tools
- Do you have an effective IDE?
- Does your IDE support outline view of your program; jumping to definitions of classes, routines, and variables; source code formatting; brace matching or begin-end matching; multiple file string search and replace; convenient compilation; and integrated debugging?
- Do you have tools that automate common refactorings?
- Are you using version control to manage source code, content, requirements, designs, project plans, and other project artifacts?
- If you’re working on a very large project, are you using a data dictionary or some other central repository that contains authoritative descriptions of each class used in the system?
- Have you considered code libraries as alternatives to writing custom code, where available?
- Are you making use of an interactive debugger?
- Do you use make or other dependency-control software to build programs
efficiently and reliably? - Does your test environment include an automated test framework, automated test generators, coverage monitors, system perturbers, diff tools, and defect tracking software?
- Have you created any custom tools that would help support your specific project’s needs, especially tools that automate repetitive tasks?
- Overall, does your environment benefit from adequate tool support?
Layout and Style
Objectives of Good Layout
Explicitly, then, a good layout scheme should:
Accurately represent the logical structure of the code.
Consistently represent the logical structure of the code.
Improve readability.
Withstand modifications.
Layout Techniques
White Space
Grouping.
Blank lines.
Indentation.
Parentheses.
Layout Styles
The following sections describe four general styles of layout:
• Pure blocks
• Emulating pure blocks

• using begin-end pairs (braces) to designate block boundaries

• Endline layout

Overall, endline layout is inaccurate, hard to apply consistently, and hard to maintain. You’ll see other problems with endline layout throughout the chapter.
Which Style Is Best?
If you’re working in Visual Basic, use pure-block indentation. (The Visual Basic IDE makes it hard not to use this style anyway.)
In Java, standard practice is to use pure-block indentation.
In C++, you might simply choose the style you like or the one that is preferred by the majority of people on your team.
Either pure-block emulation or begin- end block boundaries work equally well.
Fine Points of Formatting Control-Structure Blocks
Avoid unindented begin-end pairs.

Avoid double indentation with begin and end.
Avoid the problem by using pure-block emulation or by using begin and end as block boundaries and aligning begin and end with the statements they enclose.
Use blank lines between paragraphs.
Format single-statement blocks consistently.
1 | //Style 1 |
Style 1 follows the indentation scheme used with blocks, so it’s consistent with other approaches.
Style 2 (either 2a or 2b) is also consistent, and the begin-end pair reduces the chance that you’ll add statements after the if test and forget to add begin and end.
Its advantage over Style 1 is that if it’s copied to another place in the program, it’s more likely to be copied correctly.
Its disadvantage is that in a line-oriented debugger, the debugger treats the line as one line and the debugger doesn’t show you whether it executes the statement after the if test.
For complicated expressions, put separate conditions on separate lines.
1 | if ( ( ( '0' <= inChar ) && ( inChar <= '9' ) ) || |
Avoid gotos.
Avoid gotos. This sidesteps the formatting problem altogether.
Use a name in all caps for the label the code goes to. This makes the label
obvious.
Put the statement containing the goto on a line by itself. This makes the goto obvious.
Put the label the goto goes to on a line by itself. Surround it with blank lines. This makes the label obvious. Outdent the line containing the label to the left margin to make the label as obvious as possible.
No endline exception for case statements.
Laying Out Individual Statements
Statement Length
A common rule is to limit statement line length to 80 characters.
Here are the reasons:
• Lines longer than 80 characters are hard to read.
• The 80-character limitation discourages deep nesting.
• Lines longer than 80 characters often won’t fit on 8.5” x 11” paper.
• Paper larger than 8.5” x 11” is hard to file.
Using Spaces for Clarity
Add white space within a statement for the sake of readability:
Use spaces to make logical expressions readable.
The expression
1 | while(pathName[startPath+position]<>';') and |
is about as readable as Idareyoutoreadthis.
As a rule, you should separate identifiers from other identifiers with spaces.
1 | while ( pathName[ startPath+position ] <> ';' ) and |
Use spaces to make array references readable.
1 | grossRate[census[groupId].gender,census[groupId].ageGroup] |
1 | grossRate[ census[ groupId ].gender, census[ groupId ].ageGroup ] |
Use spaces to make routine arguments readable.
1 | ReadEmployeeData(maxEmps,empData,inputFile,empCount,inputError); |
1 | GetCensus( inputFile, empCount, empData, maxEmps, inputError ); |
Formatting Continuation Lines
Make the incompleteness of a statement obvious.

Keep closely related elements together.
When you break a line, keep things together that belong together—array refer- ences, arguments to a routine, and so on.
Indent routine-call continuation lines the standard amount.

Make it easy to find the end of a continuation line.
Indent control-statement continuation lines the standard amount.
Do not align right sides of assignment statements.
Do this.
1 | customerPurchases = customerPurchases + CustomerSales( CustomerID ); |
Indent assignment-statement continuation lines the standard amount.
Using Only One Statement per Line
Reasons to limit yourself to one statement per line are more compelling:
Putting each statement on a line of its own provides an accurate view of a program’s complexity.
It doesn’t hide complexity by making complex statements look trivial.
Statements that are complex look complex.
State- ments that are easy look easy.
Putting several statements on one line doesn’t provide optimization clues to modern compilers.
Today’s optimizing compilers don’t depend on format- ting clues to do their optimizations.
This is illustrated later in this section.
With statements on their own lines, the code reads from top to bottom, in- stead of top to bottom and left to right.
When you’re looking for a specific line of code, your eye should be able to follow the left margin of the code.
It shouldn’t have to dip into each and every line just because a single line might contain two statements.
With statements on their own lines, it’s easy to find syntax errors when your compiler provides only the line numbers of the errors.
If you have multiple statements on a line, the line number doesn’t tell you which statement is in error.
With one statement to a line, it’s easy to step through the code with line- oriented debuggers.
If you have several statements on a line, the debugger executes them all at once, and you have to switch to assembler to step through individual statements.
With one to a line, it’s easy to edit individual statements—to delete a line or temporarily convert a line to a comment.
If you have multiple statements on a line, you have to do your editing between other statements.
In C++, avoid using multiple operations per line (side effects).
PrintMessage( ++n, n + 2 ); // couldn’t tell
Laying Out Data Declarations
Use only one data declaration per line.
Declare variables close to where they’re first used.
Order declarations sensibly.
In the example above, the declarations are grouped by types.
Grouping by types is usually sensible since variables of the same type tend to be used in related op- erations.
In C++, put the asterisk next to the variable name in pointer declarations or declare pointer types
1 | EmployeeList* employees; |
The problem with putting the asterisk next to the type name rather than the vari- able name is that, when you put more than one declaration on a line, the asterisk will apply only to the first variable even though the visual formatting suggests it applies to all variables on the line.
You can avoid this problem by putting the asterisk next to the variable name rather than the type name.
1 | EmployeeList *employees; |
This approach has the weakness of suggesting that the asterisk is part of the vari- able name, which it isn’t.
The variable can be used either with or without the asterisk.
The best approach is to declare a type for the pointer and use that instead.
1 | EmployeeListPointer employees; |
Laying Out Comments
Indent a comment with its corresponding code.
Set off each comment with at least one blank line.
Laying Out Routines
Use blank lines to separate parts of a routine.
Use standard indentation for routine arguments.
As appealing aesthetically but take less work to maintain.
1 | public bool ReadEmployeeData( |
Visual Basic example of routine headers with readable, maintainable standard indentation.
1 | Public Sub ReadEmployeeData ( _ |
Laying Out Class
Laying out class interfaces
The convention is to present the class members in the following order:
- Header comment that describes the class and provides any notes about the overall usage of the class
- Constructors and destructors
- Public routines
- Protected routines
- Private routines and member data
Laying Out Class Implementations
Class implementations are generally laid out in this order:
- Header comment that describes the contents of the file the class is in
- Class data
- Public routines
- Protected routines
- Private routines
If you have more than one class in a file, identify each class clearly.
Use division like this.
1 | //********************************************************************** |
Laying Out Files and Programs
Put one class in one file.
Give the file a name related to the class name.
Separate routines within a file clearly.
Sequence routines alphabetically.
If you can’t break a program up into classes or if your editor doesn’t allow you to find functions easily, the alphabetical approach can save search time.
In C++, order the source file carefully.
Here’s the standard order of source-file contents in C++:
File-description comment
#include files
Constant definitions
Enums
Macro function definitions
Type definitions
Global variables and functions imported
Global variables and functions exported Variables and functions that are private to the file Classes
CHECKLIST: Layout
General
- Is formatting done primarily to illuminate the logical structure of the code?
- Can the formatting scheme be used consistently?
- Does the formatting scheme result in code that’s easy to maintain?
- Does the formatting scheme improve code readability?
Control Structures
- Does the code avoid doubly indented begin-end or {} pairs?
- Are sequential blocks separated from each other with blank lines?
- Are complicated expressions formatted for readability?
- Are single-statement blocks formatted consistently?
- Are case statements formatted in a way that’s consistent with the formatting of other control structures?
- Have gotos been formatted in a way that makes their use obvious? Individual Statements
- Is white space used to make logical expressions, array references, and rou- tine arguments readable?
- Do incomplete statements end the line in a way that’s obviously incorrect?
- Are continuation lines indented the standard indentation amount?
- Does each line contain at most one statement?
- Is each statement written without side effects?
- Is there at most one data declaration per line?
Comments
- Are the comments indented the same number of spaces as the code they comment?
- Is the commenting style easy to maintain?
Routines
- Are the arguments to each routine formatted so that each argument is easy to read, modify, and comment?
- Are blank lines used to separate parts of a routine?
Classes, Files and Programs
- Is there a one-to-one relationship between classes and files for most classes and files?
- If a file does contain multiple classes, are all the routines in each class grouped together and is the class clearly identified?
- Are routines within a file clearly separated with blank lines?
- In lieu of a stronger organizing principle, are all routines in alphabetical se-quence?
Self-Documenting Code
External Documentation
Unit development folders
A Unit Development Folder (UDF), or software-development folder (SDF), is an informal document that contains notes used by a developer during construction.
Detailed-design document
The detailed-design document is the low-level design document.
It describes the class-level or routine-level design decisions, the alternatives that were considered, and the reasons for selecting the approaches that were selected.
Programming Style as Documentation
It’s the most detailed kind of documentation, at the source- statement level.
The main contributor to code-level documentation isn’t comments, but good programming style.
Style includes good program structure, use of straight- forward and easily understandable approaches, good variable names, good routine names, use of named constants instead of literals, clear layout, and minimization of control-flow and data-structure complexity.
CHECKLIST: Self-Documenting Code
Classes
- Does the class’s interface present a consistent abstraction?
- Is the class well named, and does its name describe its central purpose?
- Does the class’s interface make obvious how you should use the class?
- Is the class’s interface abstract enough that you don’t have to think about how its services are implemented? Can you treat the class as a black box?
Routines
- Does each routine’s name describe exactly what the routine does?
- Does each routine perform one well-defined task?
- Have all parts of each routine that would benefit from being put into their own routines been put into their own routines?
- Is each routine’s interface obvious and clear?
Data Names
- Are type names descriptive enough to help document data declarations?
- Are variables named well?
- Are variables used only for the purpose for which they’re named?
- Are loop counters given more informative names than i, j, and k?
- Are well-named enumerated types used instead of makeshift flags or boolean variables?
- Are named constants used instead of magic numbers or magic strings?
- Do naming conventions distinguish among type names, enumerated types, named constants, local variables, class variables, and global variables?
Data Organization
- Are extra variables used for clarity when needed?
- Are references to variables close together?
- Are data types simple so that they minimize complexity?
- Is complicated data accessed through abstract access routines (abstract data types)?
Control
- Is the nominal path through the code clear?
Are related statements grouped together?
Have relatively independent groups of statements been packaged into their own routines?
Does the normal case follow the if rather than the else?
Are control structures simple so that they minimize complexity?
Does each loop perform one and only one function, as a well-defined routine would?
Is nesting minimized?
Have boolean expressions been simplified by using additional boolean variables, boolean functions, and decision tables?
Layout
- Does the program’s layout show its logical structure?
Design
- Is the code straightforward, and does it avoid cleverness? Are implementation details hidden as much as possible?
- Is the program written in terms of the problem domain as much as possible rather than in terms of computer-science or programming-language structures?
Comments
Comments can be classified into five categories:
Repeat of the Code
Not that useful.
Explanation of the Code
Make sure code itself be clear.
Marker in the Code
A marker comment is one that isn’t intended to be left in the code.
It’s a note to the developer that the work isn’t done yet.
Summary of the Code
Distills a few lines of code into one or two sentences.
Summary comments are particularly useful when someone other than the code’s original author tries to modify the code.
Description of the Code’s Intent
Explains the purpose of a section of code.
Use styles that don’t break down or discourage modification.
Any style that’s too fancy is annoying to maintain.
Use the Pseudocode Programming Process to reduce commenting time.
Integrate commenting into your development style.
“When you’re concentrating on the code you shouldn’t break your concentration to write comments.”
The appropriate response is that, if you have to concentrate so hard on writing code that commenting interrupts your thinking, you need to design in pseudocode first and then convert the pseudocode to comments.
Performance is not a good reason to avoid commenting.
Commenting Techniques
Commenting Individual Lines
Here are two possible reasons a line of code would need a comment:
- The single line is complicated enough to need an explanation.
- The single line once had an error and you want a record of the error.
Here are some guidelines for commenting a line of code:
Avoid self-indulgent comments.
Endline Comments and Their Problems
Although useful in some circumstances, endline comments pose several problems.
Endline comments tend to be hard to format.
Endline comments are also hard to maintain.
Endline comments also tend to be cryptic.
Avoid endline comments on single lines.
Avoid endline comments for multiple lines of code.
If an endline comment is intended to apply to more than one line of code, the formatting doesn’t show which lines the comment applies to.
Here are three exceptions to the recommendation against using endline comments:
Use endline comments to annotate data declarations.
1 | int boundary; // upper index of sorted part of array |
Avoid using endline comments for maintenance notes.
1 | for i = 1 to maxElmts – 1 -- fixed error #A423 10/1/92 (scm) |
Adding such a comment can be gratifying after a late-night debugging session on software that’s in production, but such comments really have no place in production code.
Use endline comments to mark ends of blocks.
Commenting Paragraphs of Code
Write comments at the level of the code’s intent.
Focus your documentation efforts on the code itself.
Focus paragraph comments on the why rather than the how.
Use comments to prepare the reader for what is to follow.
Make every comment count.
Document surprises.
Avoid abbreviations.
Differentiate between major and minor comments.

Comment anything that gets around an error or an undocumented feature in a language or an environment.
Justify violations of good programming style.
Don’t comment tricky code.
Rewrite it.
Commenting Data Declarations
Comment the units of numeric data.
Comment the range of allowable numeric values.
Comment coded meanings.

Comment limitations on input data.
Document flags to the bit level.
Stamp comments related to a variable with the variable’s name.
Document global data.
Commenting Control Structures
Put a comment before each block of statements, if, case, or loop.
Comment the end of each control structure.

Treat end-of-loop comments as a warning indicating complicated code.
Commenting Routines
Keep comments close to the code they describe.
Describe each routine in one or two sentences at the top of the routine.
Document parameters where they are declared.
The easiest way to document input and output variables is to put comments next to the parameter declarations.
1 | public void InsertionSort( |
Differentiate between input and output data.
Document interface assumptions.
Comment on the routine’s limitations.
Document the routine’s global effects.
Document the source of algorithms that are used.
Use comments to mark parts of your program.
One such technique in C++ and Java is to mark the top of each routine with a comment such as
/**
This allows you to jump from routine to routine by doing a string search for /**.
Commenting Classes, Files, and Programs
General Guidelines for Class Documentation
Describe the design approach to the class.
Describe limitations, usage assumptions, and so on.
Comment the class interface.
Don’t document implementation details in the class interface.
General Guidelines for File Documentation
Describe the purpose and contents of each file.
Put your name, email address, and phone number in the block comment.
Include a copyright statement in the block comment.
Give the file a name related to its contents.
CHECKLIST: Good Commenting Technique
General
- Can someone pick up the code and immediately start to understand it?
- Do comments explain the code’s intent or summarize what the code does, rather than just repeating the code?
- Is the Pseudocode Programming Process used to reduce commenting time?
- Has tricky code been rewritten rather than commented?
- Are comments up to date?
- Are comments clear and correct?
- Does the commenting style allow comments to be easily modified?
Statements and Paragraphs
- Does the code avoid endline comments?
- Do comments focus on why rather than how?
- Do comments prepare the reader for the code to follow?
- Does every comment count? Have redundant, extraneous, and self-indulgent comments been removed or improved?
- Are surprises documented?
- Have abbreviations been avoided?
- Is the distinction between major and minor comments clear?
- Is code that works around an error or undocumented feature commented?
Data Declarations
- Are units on data declarations commented?
- Are the ranges of values on numeric data commented?
- Are coded meanings commented?
- Are limitations on input data commented?
- Are flags documented to the bit level?
- Has each global variable been commented where it is declared?
- Has each global variable been identified as such at each usage, by a naming convention, a comment, or both?
- Are magic numbers replaced with named constants or variables rather than just documented?
Control Structures
- Is each control statement commented?
- Are the ends of long or complex control structures commented or, when possible, simplified so that they don’t need comments?
Routines
- Is the purpose of each routine commented?
- Are other facts about each routine given in comments, when relevant, including input and output data, interface assumptions, limitations, error corrections, global effects, and sources of algorithms?
Files, Classes, and Programs
- Does the program have a short document such as that described in the Book Paradigm that gives an overall view of how the program is organized?
- Is the purpose of each file described?
- Are the author’s name, email address, and phone number in the listing?
Personal Character
If you want to be great, you’re responsible for making yourself great.
It’s a matter of your personal character.
The purpose of many good programming practices is to reduce the load on your gray cells.
Here are a few examples:
The point of “decomposing” a system is to make it simpler to understand. (See Section TBD for more details.)
Conducting reviews, inspections, and tests is a way of compensating for anticipated human fallibilities. These review techniques originated as part of “egoless programming” (Weinberg 1998).
If you never made mistakes, you wouldn’t need to review your software. But you know that your intellectual capacity is limited, so you augment it with someone else’s.
Keeping routines short reduces the load on your brain.
Writing programs in terms of the problem domain rather than in terms of low-level implementation-level details reduces your mental workload.
Using conventions of all sorts frees your brain from the relatively mundane aspects of programming, which offer little payback.
Empirically, however, it’s been shown that humble programmers who compensate for their fallibilities write code that’s easier for themselves and others to understand and that has fewer errors.
Curiosity
Once you admit that your brain is too small to understand most programs and you realize that effective programming is a search for ways to offset that fact, you begin a career-long search for ways to compensate.
In the development of a superior programmer, curiosity about technical subjects must be a priority.
Build your awareness of the development process.
If your workload consists entirely of short-term assignments that don’t develop your skills, be dissatisfied.
If you’re not learning, you’re turning into a dinosaur.
You’re in too much demand to spend time working for management that doesn’t have your interests in mind.
If you can’t learn at your job, find a new one.
Experiment.
One effective way to learn about programming is to experiment with programming and the development process.
Prototype!
One key to effective programming is learning to make mistakes quickly, learning from them each time.
Making a mistake is no sin.
Failing to learn from a mistake is.
Read about problem solving.
The implication is that even if you want to reinvent the wheel, you can’t count on success.
You might reinvent the square instead.
Analyze and plan before you act.
Learn about successful projects.
Ask to look at the code of programmers you respect.
Ask to look at the code of programmers you don’t.
Compare their code, and compare their code to your own.
In addition to reading other people’s code, develop a desire to know what expert programmers think about your code.
Read!
Computer documentation tends to be poorly written and poorly organized, but for all its problems, there’s much to gain from overcoming an excessive fear of computer- screen photons or paper products.
Documentation contains the keys to the castle, and it’s worth spending time reading it.
Often the company that provides the language product has already created many of the classes you need.
If it has, make sure you know about them.
Skim the documentation every couple of months.
Read other books and periodicals.
Make a commitment to professional development.
Good programmers constantly look for ways to become better.
Consider the following professional development ladder used at my company and several others:
Level 1: Beginning.
A beginner is a programmer capable of using the basic capabilities of one language.
Such a person can write classes, routines, loops, and conditionals and use many of the features of a language.
Level 2: Introductory.
An intermediate programmer who has moved past the beginner phase is capable of using the basic capabilities of multiple languages and is very comfortable in at least one language.
Level 3: Competency.
A competent programmer has expertise in a language or an environment or both.
A programmer at this level might know all the intricacies of J2EE or have the C++ Annotated C++ Reference Manual memorized.
Programmers at this level are valuable to their companies, and many programmers never move beyond this level.
Level 4: Leadership.
A leader has the expertise of a Level 3 programmer and recognizes that programming is only 15 percent communicating with the computer, that it’s 85 percent communicating with people.
Only 30 percent of an average programmer’s time is spent working alone (McCue 1978).
Even less time is spent communicating with the computer.
The guru writes code for an audience of people rather than machines.
True guru-level programmers write code that’s crystal-clear, and they document it too.
They don’t want to waste their valuable gray cells reconstructing the logic of a section of code that they could have read in a one-sentence comment.
The sin is in how long you remain a beginner or intermediate after you know what you have to do to improve.
Intellectual Honesty
Intellectual honesty commonly manifests itself in several ways:
Refusing to pretend you’re an expert when you’re not
Readily admitting your mistakes
Trying to understand a compiler warning rather than suppressing the message
Clearly understanding your program—not compiling it to see if it works
Providing realistic status reports
Providing realistic schedule estimates and holding your ground when management asks you to adjust them
You’d be better off pretending that you don’t know anything.
Listen to people’s explanations, learn something new from them, and assess whether they know what they are talking about.
A related kind of intellectual sloppiness occurs when you don’t quite understand your program and “just compile it to see if it works.”
In that situation, it doesn’t really matter whether the program works because you don’t understand it well enough to know whether it works or not.
Remember that testing can only show the presence of errors, not their absence.
The mistake Bert made was not realizing that estimates aren’t negotiable.
He can revise an estimate to be more accurate, but negotiating with his boss won’t change the time it takes to develop a software project.
In the long run, he’ll lose credibility by compromising.
In the short run, he’ll gain respect by standing firm on his estimate.
Communication and Cooperation
Programming is communicating with another programmer first, communicating with the computer second.
Creativity and Discipline
Without standards and conventions on large projects, project completion itself is impossible.
Don’t waste your creativity on things that don’t matter.
Laziness
Laziness manifests itself in several ways:
- Deferring an unpleasant task
- Doing an unpleasant task quickly to get it out of the way
- Writing a tool to do the unpleasant task so that you never have to do the task again
It’s easy to confuse motion with progress; busy-ness with being productive.
The most important work in effective programming is thinking, and people tend not to look busy when they’re thinking.
Persistence
Depending on the situation, persistence can be either an asset or a liability.
Persistence when you’re stuck on a piece of new code is hardly ever a virtue.
Try redesigning the class, try an alternative coding approach, or try coming back to it later.
When one approach isn’t working, that’s a good time to try an alternative.
In debugging, it can be mighty satisfying to track down the error that has been annoying you for four hours, but it’s often better to give up on the error after a certain amount of time with no progress—say 15 minutes.
Let your subconscious chew on the problem for a while.
Experience
In software, if you can’t shake the habits of thinking you developed while using your former programming language or the code-tuning techniques that worked on your old machine, your experience will be worse than none at all.
The bottom line on experience is this: If you work for 10 years, do you get 10 years of experience or do you get 1 year of experience 10 times? You have to reflect on your activities to get true experience. If you make learning a continuous commitment, you’ll get experience. If you don’t, you won’t, no matter how many years you have under your belt.
Good habits matter because most of what you do as a programmer you do without consciously thinking about it.
When you first learn something, learn it the right way.
In programming, try to develop new habits that work.
Develop the habit of writing a class in pseudocode before coding it and carefully reading the code before compiling it, for instance.
Themes in Software Craftsmanship
Conquer Complexity
- Dividing a system into subsystems at the architecture level so that your brain can focus on a smaller amount of the system at one time.
- Carefully defining class interfaces so that you can ignore the internal workings of the class
- Preserving the abstraction represented by the class interface so that your brain doesn’t have to remember arbitrary details.
- Avoiding global data, because global data vastly increases the percentage of the code you need to juggle in your brain at any one time.
- Avoiding deep inheritance hierarchies because they are intellectually demanding
- Avoiding deep nesting of loops and conditionals because they can be replaced by simpler control structures that burn up fewer gray cells.
- Avoiding gotos because they introduce non-linearity that has been found to be difficult for most people to follow.
- Carefully defining your approach to error handling rather than using an arbitrary proliferation of different error-handling techniques.
- Being systematic about the use of the built-in exception mechanism, which can become a non-linear control structure that is about as hard to understand as gotos if not used with discipline.
- Not allowing classes to grow into monster classes that amount to whole programs in themselves.
- Keeping routines short.
- Using clear, self-explanatory variable names so that your brain doesn’t have to waste cycles remembering details like “i stands for the account index, and j stands for the customer index, or was it the other way around?”
- Minimizing the number of parameters passed to a routine, or, more important, passing only the parameters needed to preserve the routine interface’s abstraction.
- Using conventions to spare your brain the challenge of remembering arbitrary, accidental differences between different sections of code.
- In general, attacking what Chapter 5 describes as “accidental details” wherever possible.
Naming variables functionally, for the “what” of the problem rather than the “how” of the implementation-level solution, increases the level of abstraction.
Pick Your Process
One example of the way in which process matters is the consequence of not making requirements stable before you begin designing and coding.
The same principle of consciously attending to process applies to design.
You have to lay a solid foundation before you can begin building on it.
If you rush to coding before the foundation is complete, it will be harder to make fundamental changes in the system’s architecture.
The main reason the process matters is that in software, quality must be built in from the first step onward.
Premature optimization is another kind of process error. In an effective process, you make coarse adjustments at the beginning and fine adjustments at the end.
Low-level processes matter too. If you follow the process of writing pseudocode and then filling in the code around the pseudocode, you reap the benefits of designing from the top down.
Observing large processes and small processes means pausing to pay attention to how you create software.
Write Programs for People First, Computers Second
Habits affect all your work; you can’t turn them on and off at will, so be sure that what you’re doing is something you want to become a habit.
A professional programmer writes readable code, period.
Program Into Your Language, Not In It
The best programmers think of what they want to do, and then they assess how to accomplish their objectives with the programming tools at their disposal.
In more typical cases, the gap between what you want to do and what your tools will readily support will require you to make only relatively minor concessions to your environment.
Focus Your Attention with the Help of Conventions
Indentation conventions can concisely show the logical structure of a program.
Alignment conventions can indicate concisely that statements are related.
Conventions protect against known hazards.
Program in Terms of the Problem Domain
Another specific method of dealing with complexity is to work at the highest possible level of abstraction.
One way of working at a high level of abstraction is to work in terms of the programming problem rather than the computer-science solution.
Separating a Program into Levels of Abstraction

Level 0: Operating System Operations and Machine Instructions
If you’re working in a low-level language, you should try to create higher layers for yourself to work in, even though many programmers don’t do that.
Level 1: Programming-Language Structures and Tools
Level 2: Low-Level Implementation Structures
They tend to be the operations and data types you learn about in college courses in algorithms and data types—stacks, queues, linked lists, trees, indexed files, sequential files, sort algorithms, search algorithms, and so on.
Level 3: Low-Level Problem-Domain Terms
To write code at this level, you need to figure out the vocabulary of the problem area and create building blocks you can use to work with the problem the program solves.
In many applications, this will be the business objects layer or a services layer.
Level 4: High-Level Problem-Domain Terms
Changes in the problem domain should affect this layer a lot, but they should be easy to accommodate by programming in the problem-domain building blocks from the layer below.
Low-Level Techniques for Working in the Problem Domain
You can use many of the techniques in this book to work in terms of the real-world problem rather than the computer-science solution:
- Use classes to implement structures that are meaningful in problem-domain terms.
- Hide information about the low-level data types and their implementation details.
- Use named constants to document the meanings of strings and of numeric literals.
- Assign intermediate variables to document the results of intermediate calculations.
- Use boolean functions to clarify complex boolean tests.
Watch for Falling Rocks
When you or someone else says “This is really tricky code,” that’s a warning sign, usually of poor code. “Tricky code” is a code phrase for “bad code.”
A class’s having more errors than average is a warning sign.
You can use design metrics as another kind of warning sign.
Because programming is still a craft, however, a warning sign merely points to an issue that you should consider.
A good process wouldn’t allow error-prone code to be developed.
It would include the checks and balances of architecture followed by architecture reviews, design followed by design reviews, and code followed by code reviews.
Any warning sign should cause you to doubt the quality of your program.
“Doubt is an uneasy and dissatisfied state from which we struggle to free ourselves and pass into the state of belief.”
If you find yourself working on repetitious code or making similar modifications in several areas, you should feel “uneasy and dissatisfied,” doubting that control has been adequately centralized in classes or routines.
The quality of the thinking that goes into a program largely determines the quality of the program, so paying attention to warnings about the quality of thinking directly affects the final product.
Iterate, Repeatedly, Again and Again
The purpose of a review is to check the quality of the work at a particular point.
If the product fails the review, it’s sent back for rework.
If it succeeds, it doesn’t need further iteration.
Eclecticism(折衷主义)
If you decide on the solution method before you fully understand the problem, you act prematurely.
You over-constrain the set of possible solutions, and you might rule out the most effective solution.
You’ll be uncomfortable with any new methodology initially, and the advice that you avoid religion in programming isn’t meant to suggest that you should stop using a new method as soon as you have a little trouble solving a problem with it.